qubole / rubix

Cache File System optimized for columnar formats and object stores
Apache License 2.0
182 stars 74 forks source link

Minimize error in space usage accounting #464

Closed shubhamtagra closed 3 years ago

shubhamtagra commented 3 years ago

Fixes #463

Main changes:

  1. Limit the size of each readRequest in FileDownloadRequestChain to 100MB
  2. With the download of each readRequest itself update the metadata instead of existing behavior of downloading all the data in the Chain and then updating metadata

This ensures we will never cross over the configured threshold of disk utilization by 1GB. This crossing over is actually partially handled by only considering 95% of disk space as available, keeping 5% as buffer space for this cross over but in existing warmup scenario we could cross the limit by a much bigger margin.