The cumulus docker containers are great, really helped me jumpstart into running cellranger on terra :-)
One thing I noticed was how much faster the newer gcloud storage cp. There's a blogpost from last year talking about how it's faster than gsutil and I thought they might have exaggerated a little. But it's really demonstrable, I get 10's of MiB/sec with gsutil and a consistent 600 MiB/sec with gcloud, especially if I use a locally attached SSD. (On a non-terra instance where I have 20-30 cores, I consistently see 1 GB/sec, even on a persistent balanced disk, probably bc of the weird network caps which scale to the # CPU's). Especially when using the bcl's or fastq's, it's hours of runtime + cost differences.
I was talking to the terra folks about this, but I think it could be harder to change there when gsutil is so well battle tested and people could depend on niche aspects of its behavior. But maybe stratocumulus is more constrained and so you could validate it works in how you use it. Skimming through, I think you would just have to convert -o ... -> gcloud config set storage/..., and then everyone using it would get a ~60x speed up!
Unfortunately there is only gcloud storage cp not rsync
Hi,
The cumulus docker containers are great, really helped me jumpstart into running cellranger on terra :-)
One thing I noticed was how much faster the newer
gcloud storage cp
. There's a blogpost from last year talking about how it's faster thangsutil
and I thought they might have exaggerated a little. But it's really demonstrable, I get 10's of MiB/sec with gsutil and a consistent 600 MiB/sec withgcloud
, especially if I use a locally attached SSD. (On a non-terra instance where I have 20-30 cores, I consistently see 1 GB/sec, even on a persistent balanced disk, probably bc of the weird network caps which scale to the # CPU's). Especially when using the bcl's or fastq's, it's hours of runtime + cost differences.I was talking to the terra folks about this, but I think it could be harder to change there when
gsutil
is so well battle tested and people could depend on niche aspects of its behavior. But maybe stratocumulus is more constrained and so you could validate it works in how you use it. Skimming through, I think you would just have to convert-o ...
->gcloud config set storage/...
, and then everyone using it would get a ~60x speed up!Unfortunately there is only
gcloud storage cp
notrsync