matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.71k stars 265 forks source link

[Tech Request]: Storage Optimization Policy Zoo #10208

Open fengttt opened 1 year ago

fengttt commented 1 year ago

Is there an existing issue for the same feature request?

Is your feature request related to a problem?

Storage Optimization Policy is extremely important and difficult.
We need a framework that we can set the policy on a workload and run/test/experiment/improve.

Describe the feature you'd like

6955 will move storage optimization from DN to CN. Assume a CN received an request, CN need to choose candidate blocks, merge/sort them and write it out.

How to choose these blocks, and how much to merge is highly tricky. We want to parametrize the policy and make it easy to test and tune. This policy parametrization probably need to be associated with each table.

alter table XXX set storage_optimization_policy = -- a json string, something like '{"policy": "greedy", "merge_size": 32}'; -- greedy, whatever that means, 32G per merge

or other things like '{"policy": "smallest_first", "merge_blocks": 128} -- try to merge merge smallest 128 s3 objs

Describe implementation you've considered

DN will monitor changes to a table. When deemed necessary, fire up a storage optimization request.

One CN will pick it up (could be a few dedicated CN for whole DN), and choose blocks to merge according to the storage policy of the table (or default). Choose target blocks, run merge. One merge request from DN may trigger a series of merges. However it has to release resource after a few merges to other tables.

This is about enable this framework. Exactly policy and parameters, there will be many many implementations and experiments.

Documentation, Adoption, Use Case, Migration Strategy

No response

Additional information

No response

fengttt commented 12 months ago

Related, #9867, #9883, #10294