Open dborkar opened 3 years ago
Hi @dborkar, Presto already has an Alluxio cache service which one of my friends use at their company. There was a blog post about it improving latencies at Facebook. Have you tried that?
@haoyuan I think is the expert on that service.
PrestoDB is already integrated with Alluxio data caching (docs). Users can refer to many existing use cases like Facebook, EA, Data Sapiens and many more, checking out how S3 IO bottleneck can be removed by leveraging this caching.
@javrod87 Looks like you are a very new user to the Presto community! Welcome to the community. :-)
Thank you for sharing about Alluxio. I am familiar with Alluxio. Alluxio is a very nice distributed cache. Lots of fantastic features. In fact, it is much more than a cache, it provides a lot of other capabilities now like transformation as well - but some of our users are looking for a light weight node level cache.
It would be good to give our users options - Alluxio if they want that or Rubix if they want that. TrinoDB added support for Rubix as well as the Hive connector caching given similar user feedback.
@apc999 I see you closed the issue, please re-open as Alluxio does not solve this issue for a light weight option and it would be good to give Presto users a choice of what to use.
@dborkar Can you please shed some light on why the existing caching support of Presto doesn't help here? If you have specific data around why the current caching support is heavyweight that would be great to see. Maybe there is some way to tune it to make it work for the use case you have in mind. /cc @highker
@dborkar, could you help to refer to this post (https://prestodb.io/blog/2021/02/04/raptorx) on caching. It is a light-weighted cache.
For some AWS instances with low network bandwidth, we are seeing IO bottle necks for users. Multiple users are requesting a light weight node level cache with async prefetch to improve query latency.
Rubix open sourced by Qubole can be integrated with PrestoDB for the Hive connector to help with this.