risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.88k stars 570 forks source link

Microbenchmark all source parsers #10972

Open kwannoel opened 1 year ago

kwannoel commented 1 year ago

Seems like there could be regression / perf optimizations for various parsers. We need to micro-benchmark all of them to catch regressions.

Originally mentioned by @lmatz : https://github.com/risingwavelabs/risingwave/issues/10840#issuecomment-1633625393

kwannoel commented 1 year ago

cc @tabVersion

kwannoel commented 1 year ago

Another comment by @neverchanje :

Personally, if I wanted to test if there is some bottleneck in the parser, I would create a source and a dummy sink (blackhole) and see whether the throughput is bounded by the IO or by the CPU. If it’s CPU, it means that the parser spends too much time, which is definitely unexpected.

Before diving into the microbench, we should first see if it is bottleneck. For json_parser, previously q0 shows regression which suggested that we need to optimize json_parser. If we wish to optimize other parser, e.g. debezium json, we should similarly find such evidence.

neverchanje commented 1 year ago

Right. You can still schedule the micro benchmarking every day and record the results, thus a perf regression can be immediately found based on the history. Whether you want to prioritize this task totally depends on your motivation.

github-actions[bot] commented 1 year ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

github-actions[bot] commented 3 months ago

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean. Don't worry if you think the issue is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄