Add duplicate columns with different index to improve Pinot partial match performance - Githubissues

uber / cadence

Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.

https://cadenceworkflow.io

MIT License

7.96k stars 772 forks source link

Add duplicate columns with different index to improve Pinot partial match performance #6149

Closed neil-xie closed 3 days ago

neil-xie commented 3 days ago

What changed? Add duplicate workflowID/runID/workflowType columns which will be applied text index to improve Pinot prefix partial match performance.

Why? Currently we applied inverted index for these 3 columns, the partial is kind of slow there. We want to apply text index but the columns can only have one index at a time. So we added 3 duplicate columns and apply text index (in monorepo) to improve the performance.

How did you test it?

Potential risks

Release notes

Documentation Changes

coveralls commented 3 days ago

Pull Request Test Coverage Report for Build 01905abe-099f-49c8-a5c2-4352a3db223f

Details

4 of 4 (100.0%) changed or added relevant lines in 1 file are covered.
1913 unchanged lines in 27 files lost coverage.
Overall coverage increased (+0.1%) to 71.579%

Files with Coverage Reduction	New Missed Lines	%
common/task/weighted_round_robin_task_scheduler.go	1	89.05%
service/history/task/transfer_standby_task_executor.go	2	86.64%
common/cache/lru.go	2	93.01%
common/task/fifo_task_scheduler.go	2	87.63%
common/constants.go	2	0.0%
common/persistence/statsComputer.go	3	98.21%
service/history/engine/engineimpl/register_domain_failover_callback.go	3	60.0%
service/history/execution/context.go	4	93.33%
common/persistence/sql/sql_shard_store.go	4	97.16%
service/history/engine/engineimpl/reset_queues.go	4	0.0%
<!--	Total:	1913		-->

Totals
Change from base Build 0190573d-ff12-4850-94f0-8c77deb099df:	0.1%
Covered Lines:	107168
Relevant Lines:	149719

💛 - Coveralls