Open landscapepainter opened 8 months ago
Hey, happy to pick this up! Some thoughts so far:
upload
method will be a good place to start looking at making changes
https://github.com/skypilot-org/skypilot/blob/76c5af9cc24663528d5e5f57253f252a9aa4d8bf/sky/data/storage.py#L1531-L1558gsutil rsync
and rclone sync
, seems like rclone
can replace gsutil rsync
. Some community posts:
Please let me know if I'm on the right track, or if there's anything else I should know before diving more into the implementation.
Welcome to Skypilot @dtran24!
upload
method is a great place to start. After, you can take a look into _execute_file_mounts
with cloud_stores.py/make_sync_dir_command
and make_sync_file_command
for fetching files/dirs from GCS to remote VM.
One thing to note is that IBM COS uses rclone
as well, so we have an abstraction for rclone, data_utils.Rclone
, that can be utilized.
Heya! I wonder if it's worth to add exploring s5cmd. For AWS, there's some benchmarking results to take with a grain of salt. Maybe an adaptor over this would allow a more unified API across clouds with better performance?
Hey @sqr00t, thanks for sharing the benchmark results and suggestion. We actually do have a PR for adding s5cmd
with crt client! Are you an active user of s5cmd
outside of skypilot
? Was wondering if you encountered any edge cases while using it compared to aws cli
:)
Hi, I'm interested on working on this! If nobody's working on it, could I give it a shot?
Hey @aseriesof-tubes thanks for taking on this issue. It's a very important issue to enhance user experience. I just assigned you to the problem.
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.
According to some user report,
rclone sync
is an order of magnitude faster thangsutil rsync
, which SkyPilot uses to upload from local node to cloud storage and to download from cloud storage to local node.The user reported a case of fetching millions of images and json files worth of ~500GB from the storage to the node:
With some loose benchmark on 4 categories below, it turns out that using rclone sync is much faster than gsutil rsync in a case there's huge number of small files being fetched from cloud storage(GS) to node(GCP).
We should investigate if rclone sync has all the features we need to run smoothly with SkyPilot, and replace
gsutil rsync
withrclone sync
if feasible.Update: Added a new benchmark of
gsutil -m copy -r
, and used a larger amount of data using 10000 of 1MB files as opposed to the previous benchmark above which has 1000 of 1MB files.