lilab-bcb / stratocumulus

Backend component of Cumulus for different cloud environments.
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Use `gcloud storage cp`? #21

Open JZL opened 1 year ago

JZL commented 1 year ago

Hi,

The cumulus docker containers are great, really helped me jumpstart into running cellranger on terra :-)

One thing I noticed was how much faster the newer gcloud storage cp. There's a blogpost from last year talking about how it's faster than gsutil and I thought they might have exaggerated a little. But it's really demonstrable, I get 10's of MiB/sec with gsutil and a consistent 600 MiB/sec with gcloud, especially if I use a locally attached SSD. (On a non-terra instance where I have 20-30 cores, I consistently see 1 GB/sec, even on a persistent balanced disk, probably bc of the weird network caps which scale to the # CPU's). Especially when using the bcl's or fastq's, it's hours of runtime + cost differences.

I was talking to the terra folks about this, but I think it could be harder to change there when gsutil is so well battle tested and people could depend on niche aspects of its behavior. But maybe stratocumulus is more constrained and so you could validate it works in how you use it. Skimming through, I think you would just have to convert -o ... -> gcloud config set storage/..., and then everyone using it would get a ~60x speed up!

Unfortunately there is only gcloud storage cp not rsync

yihming commented 1 year ago

Hi @JZL . Thank you for letting us know about this alternative. I'll definitely try it on and see if stratocumulus could adopt.