hse-project / hse

HSE: Heterogeneous-memory storage engine
https://hse-project.github.io
673 stars 65 forks source link

Need docs about engine architecture #579

Closed ZhangJiaQiao closed 1 year ago

ZhangJiaQiao commented 1 year ago

Please offer some docs explaining the inner architecture of the HSE engine. I can not find any relative resources from the official site or the github repo. I want to learn the index architecture and key-value data management of the HSE.

tristan957 commented 1 year ago

We have had a few people in the past ask for documentation as well. We are definitely aware that it needs work. An ARCHITECTURE.md/whitepaper are definitely on the list of things to eventually get to. Unfortunately it can be hard prioritizing documentation over feature development and bug fixing. Do you have specific questions that maybe I can get answers to more quickly?

ZhangJiaQiao commented 1 year ago

We have had a few people in the past ask for documentation as well. We are definitely aware that it needs work. An ARCHITECTURE.md/whitepaper are definitely on the list of things to eventually get to. Unfortunately it can be hard prioritizing documentation over feature development and bug fixing. Do you have specific questions that maybe I can get answers to more quickly?

Thanks for your reply. Here are some of my questions:

  1. Are there any papers or information about the index structure? Is the index a tire or something else?
  2. Are there any guide about how to run HSE under pmem, staging and capacity storage?
  3. Why dose HSE use periodic flush for WAL? Can it be configured with the classical WAL to ensure atomicity for a single write operation? Dose HSE get a lot improvement from the periodic flush WAL?
tristan957 commented 1 year ago

@nabeelmmd might be a good person to provide an answer about number 3.

smoyergh commented 1 year ago

RE 2: The docs (https://hse-project.github.io) contain detailed instructions

RE 3: Having a configurable sync interval both increases performance and aligns to a common config option on many DBs. You can use the sync() API to ensure any particular update is durable. All updates are atomic, which is different.

ZhangJiaQiao commented 1 year ago

Thanks for your answers. Another question: HSE gets a great improvement on the operation latency and throughput, compared to WT and RocksDB . What technique does it use to achieve this? Are there any special optimization for NVMe/SATA SSD storage in HSE?

smoyergh commented 1 year ago

Many factors contribute to the performance gains. However the primary factors are 1) reduced write/read amplification resulting from our unique variant on LSM trees and associated compaction algorithms, and 2) a focus on highly concurrent data structures, including those based on RCU where applicable.

ZhangJiaQiao commented 1 year ago

Got it. Thanks for your reply. I want to know more about HSE implementation from its code. There are other questions about the implementation and I have opened another issue in the Q&A Discussions.