Open andersy005 opened 5 years ago
I would like to archive this repository, as the code in it has been abandoned. (There has not been a commit in 9 months.) It's existence is basically misleading to newcomers.
However, there are some valuable discussions happening here. Would you be okay if I transferred this issue to https://github.com/pangeo-data/benchmarking, which is more active.
@rabernat, no problem at all. Feel free to transfer this issue to the benchmarking repository.
I don't want this repo to get too crowded, but I'm starting to wonder if the scripting/code for these transfer benchmarks can't be added to this repo. Maybe this repo needs some reorganization?
Thoughts?
Thanks to @rsignell-usgs's script, I've been playing around with netCDF->Zarr conversion on S3. I am wondering whether there's any throughput data that I can use to make sense of the following measurements I recorded? Or if someone has played with transferring Zarr to S3/GCP in the past, I'd like to know more about this and/or best practices for this kind of task. How to tune Dask cluster to maximize the throughput, etc?
Dask configuration
Dask configuration
Here's my script: