Is there an existing issue for the same feature request?
[X] I have checked the existing issues.
Is your feature request related to a problem?
Storage Optimization Policy is extremely important and difficult.
We need a framework that we can set the policy on a workload and run/test/experiment/improve.
Describe the feature you'd like
6955 will move storage optimization from DN to CN. Assume a CN received an request, CN need to choose candidate blocks, merge/sort them and write it out.
How to choose these blocks, and how much to merge is highly tricky. We want to parametrize the policy and make it easy to test and tune. This policy parametrization probably need to be associated with each table.
alter table XXX set storage_optimization_policy =
-- a json string, something like
'{"policy": "greedy", "merge_size": 32}'; -- greedy, whatever that means, 32G per merge
or other things like '{"policy": "smallest_first", "merge_blocks": 128} -- try to merge merge smallest 128 s3 objs
Describe implementation you've considered
DN will monitor changes to a table. When deemed necessary, fire up a storage optimization request.
One CN will pick it up (could be a few dedicated CN for whole DN), and choose blocks to merge according to the storage policy of the table (or default). Choose target blocks, run merge. One merge request from DN may trigger a series of merges. However it has to release resource after a few merges to other tables.
This is about enable this framework. Exactly policy and parameters, there will be many many implementations and experiments.
Documentation, Adoption, Use Case, Migration Strategy
Is there an existing issue for the same feature request?
Is your feature request related to a problem?
Describe the feature you'd like
6955 will move storage optimization from DN to CN. Assume a CN received an request, CN need to choose candidate blocks, merge/sort them and write it out.
How to choose these blocks, and how much to merge is highly tricky. We want to parametrize the policy and make it easy to test and tune. This policy parametrization probably need to be associated with each table.
alter table XXX set storage_optimization_policy = -- a json string, something like '{"policy": "greedy", "merge_size": 32}'; -- greedy, whatever that means, 32G per merge
or other things like '{"policy": "smallest_first", "merge_blocks": 128} -- try to merge merge smallest 128 s3 objs
Describe implementation you've considered
DN will monitor changes to a table. When deemed necessary, fire up a storage optimization request.
One CN will pick it up (could be a few dedicated CN for whole DN), and choose blocks to merge according to the storage policy of the table (or default). Choose target blocks, run merge. One merge request from DN may trigger a series of merges. However it has to release resource after a few merges to other tables.
This is about enable this framework. Exactly policy and parameters, there will be many many implementations and experiments.
Documentation, Adoption, Use Case, Migration Strategy
No response
Additional information
No response