Open simerplaha opened 3 years ago
@simerplaha if you could point me to some areas maybe I could contribute a bit - this is super annoying in the project I'm working on
Yea I bet it's annoying.
For this issue we need a lightweight solution around read APIs in all file types that tells the sweepers to close or delete files only if they are not being currently read.
Currently sweepers close and delete files after a configured deadline, which requires reads APIs (get
, stream
etc) to set checkpoints so they can continue from previously failed (if any) checkpoint. This is a complex solution and should be replaced/removed.
I doubt there would be any need to make changes to compaction itself, but if there is, that code is here.
This is sizeable task to implement. I'm not sure it's going to be a quick one for anyone just starting off.
I guess you'd need to work from the tag v0.16.2 for the last release which was 2 years ago.
Current behaviour
Currently compaction is allowed to close files without checking if its being read by another thread which requires reads to set checkpoints incase the file was closed while the read was in progress.
New behaviour
The need for checkpoints can be removed completed if compaction checks
read reference counters
in files (#317) and delays closing files if a read is in progress.Benefits
Bag[_]
(Future
) from the API therefore making the APIs simpler.Future
.