matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.76k stars 273 forks source link

[Tech Request]: refactor logtail replay in CN #14908

Open reusee opened 6 months ago

reusee commented 6 months ago

Is there an existing issue for the same tech request?

Does this tech request not affect user experience?

What would you like to be added ?

refactor the logtail replay codes to eliminate copy-on-write btree.

Why is this needed ?

No response

Additional information

No response

reusee commented 6 months ago

maybe useful links: https://www.semanticscholar.org/paper/Zip-Trees-Tarjan-Levy/392e4210d804dc488f5ae9f90de7001527dd8ebe https://www.semanticscholar.org/paper/Zip-zip-Trees%3A-Making-Zip-Trees-More-Balanced%2C-or-Gila-Goodrich/5474aa3c85484da664b2641aa4b71138845f1e01

reusee commented 6 months ago

will continue after the TAE team has done their refactorings.

reusee commented 5 months ago
  1. the logtail protocol should be stable before major logtailreplay package refactors, otherwise we can't ensure the semantic correctness during the refactor about changing from copy-on-write to multi-version implementation.
  2. I doubt whether the bottleneck is really at logtail replaying. I'll do some executing tracer works to measure it.
  3. Implementing a concurrent btree will not be done in the short term.
  4. The logtailreplay package needs unit tests to ensure semantic correctness and performance measuring.
reusee commented 3 months ago

re-assigning to @triump2020

related PR: https://github.com/matrixorigin/matrixone/pull/16228