pangeo-cmip6 / sync

Workflows to keep CMIP6 data synchronized between GCS and S3 storage
2 stars 1 forks source link

distcp / treeverse-distcp #1

Open rabernat opened 3 years ago

rabernat commented 3 years ago

These tools looks really cool

charlesbluca commented 3 years ago

Thanks for bringing these up! In particular, the sync example using Hadoop DistCp looks like it could be particularly useful, although there are a lot of limiting factors:

I think that as it is, treeverse-distcp could be a useful tool for Pangeo Forge in making recipes to copy/move data across cloud providers, and understanding how to take something like this and apply it to GCP Dataflow or other cloud batch processing services could be useful in the long run.