rosedblabs / rosedb

Lightweight, fast and reliable key/value storage engine based on Bitcask.
https://rosedblabs.github.io
Apache License 2.0
4.58k stars 632 forks source link

Question: Hint file usage #276

Closed amityahav closed 1 year ago

amityahav commented 1 year ago

Hey, first of all i wanna say that i really like this project and you guys doing a great job. i understand the purpose of the hint file but im not sure i understand how rosedb is utilizing its existence. so as i saw in code the Hint file is created only after a merge operation and after that it is used to load the in-memory index. so my wonder is: if the merge operation already creates a WAL file with only valid records, then after restart it could simply iterate over the new WAL file and re-build it (of course we cant know if the WAL files are after merge operation so we assume it either way so the existing logic here should remain partially the same). so even after doing all sorts of CRUD operations over the DB without a merge at the end, there will still be a part of the WAL which is the same as the hint file so why not just use it, and avoid calling both functions here. Also, the hint file remains the same as long as no newer merge operation occurred. so after a while it may contain in-valid data due to newer deletion/updates and in this case it will be the same as just to load everything entirely from the WAL files because of the reasons i've mentioned earlier. hope i was clear enough, thanks

roseduan commented 1 year ago

Thanks for your attention.

The difference between the hint file and the WAL file(after merge) is that the hint file only contains the key and position info, not the value.

So when we restart the db, we can load the hint file directly to rebuild the index, not to iterate all data in wal, if the value is large, it may need a long time to build.

amityahav commented 1 year ago

got it, so basically both WAL and Hint file will contain only valid data, but when iterating WAL instead values would needed to be loaded into RAM so this can slow down the whole build process right?

roseduan commented 1 year ago

got it, so basically both WAL and Hint file will contain only valid data, but when iterating WAL instead values would needed to be loaded into RAM so this can slow down the whole build process right?

Yes, the meaning of hint file`s existence is that it does not contain value, so its size will be smaller than WAL file(especially if the value is large).