skyplane-project / skyplane

🔥 Blazing fast bulk data transfers between any cloud 🔥
https://skyplane.org
Apache License 2.0
1.09k stars 62 forks source link

[request] HPC filesystem to Cloud #924

Open TomNicholas opened 1 year ago

TomNicholas commented 1 year ago

Please describe problem to be solved
A lot of scientists (particularly Climate Scientists) are now sharing their data via public cloud buckets (e.g. ERA-5 on GCP, see this paper for rationale). However, transferring data from large numerical simulations produced on HPC systems is often really challenging. I want to use skyplane to transfer large amounts of data from a HPC local filesystem to the cloud.

(Optional): Suggest a solution I want to be able to use Skyplane locally on a HPC filesystem (e.g. Lustre) to transfer TBs (or even PBs) of data to the cloud. Even better would be if I could call Skyplane from Python, to integrate with existing tooling for moving scientific datasets to the cloud (which currently can only pull data from other public data portals elsewhere on the internet.)

sarahwooders commented 1 year ago

Hi @TomNicholas - thanks for the suggestion! I believe this should be possible if a StorageInterface that interacts with Luster (or whatever other on-prem filesystem/object storage you have).

@lynnliu030 could VM-to-VM transfers be extended to support on-prem?