Open chrisvittal opened 4 months ago
Were you using the old copier or the new (not yet merged) hailctl fs sync
? I had hoped the latter was finally robust enough for real use. hailtop.aiotools.copy
is indeed not very reliable. Regardless, using the rewrite action when the source and destination agree is the correct move.
We used a one off script, an attempt was made to use Copier.copy
, but that wasn't reliable enough. We also needed to rename destination files beyond what the sync (or copy) tool is capable of.
As part of our work with generating All of Us datasets, we needed to copy around a million gcs objects. Our
Copier
infrastructure 'should' be able to handle that, but it kept falling with robustness issues. What finally worked was using GCS's rewrite api. This allowed us to copy data without reading it, allowing the copies to complete in a fraction of the time while also reducing bandwidth needs.There are two components to this:
Copier
, and the new sync tool (#14248)Here's the code I used for making the rewrite requests for merging a set of matrix tables together, the progress bar code was for visibility.