juicedata / juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.
https://juicefs.com
Apache License 2.0
10.81k stars 952 forks source link

HPC Cluster Support (Ray) #3314

Closed trahloff closed 1 year ago

trahloff commented 1 year ago

What would you like to be added: Support for using JuiceFS with HPC clusters like Ray.

Why is this needed: JuiceFS is an amazing tool that makes it extremely easy to work with data that is stored on S3 but needs to be instantly available for high parallel i/o. In our case, this use case is "machine learning on satellite images".

Having this integrated into k8s via CSI is great but constraints the usecases to workloads that fit well into a k8s environment. Because of our extremely high ressource-requirements, we and many other companies that are running data processing pipelines rely on compute clusters like Ray. Do you see any possibility in integrating JuiceFS natively into Ray?

SandyXSD commented 1 year ago

That sounds interesting, but currently we still lack native Python APIs. So for now, you have to pick one of the three interfaces (mount as a network file system, S3 gateway, HDFS) provided by JuiceFS to interact with Ray.

trahloff commented 1 year ago

Hi @SandyXSD, thanks for the quick response! Would it work out-of-the-box if Ray clusters would support CSI?

SandyXSD commented 1 year ago

Yes, it should be.