Closed trahloff closed 1 year ago
That sounds interesting, but currently we still lack native Python APIs. So for now, you have to pick one of the three interfaces (mount as a network file system, S3 gateway, HDFS) provided by JuiceFS to interact with Ray.
Hi @SandyXSD, thanks for the quick response! Would it work out-of-the-box if Ray clusters would support CSI?
Yes, it should be.
What would you like to be added: Support for using JuiceFS with HPC clusters like Ray.
Why is this needed: JuiceFS is an amazing tool that makes it extremely easy to work with data that is stored on S3 but needs to be instantly available for high parallel i/o. In our case, this use case is "machine learning on satellite images".
Having this integrated into k8s via CSI is great but constraints the usecases to workloads that fit well into a k8s environment. Because of our extremely high ressource-requirements, we and many other companies that are running data processing pipelines rely on compute clusters like Ray. Do you see any possibility in integrating JuiceFS natively into Ray?