qubole / rubix

Cache File System optimized for columnar formats and object stores
Apache License 2.0
182 stars 74 forks source link

Over-estimation of cache size for small files #404

Closed shubhamtagra closed 4 years ago

shubhamtagra commented 4 years ago

Because we increment the size of the cache entry in the increments of block size, if cache has lot of small files then we can be over estimating the disk space used. For the default 1MB block size, for a file of size 0.1MB in cache we account 1MB as occupied space. We know the fileSize at the time we are updating the occupied size in setAllCached, we can use that info to cap the value accounted.