qubole / rubix

Cache File System optimized for columnar formats and object stores
Apache License 2.0
182 stars 74 forks source link

Avoid extra listStatus calls at worker level #452

Open shubhamtagra opened 3 years ago

shubhamtagra commented 3 years ago

We do an extra listStatus call for every FS.open() call, in CachingInputStream creation, to get the fileSize and lastModifiedDate. This increase the api calls to clouds and can mean additional costs. We can avoid this by serializing this information into filePath when creating the FileStatus at coordinator.