s5cmd can execute operations in parallel, which will massively speed up our repo generation, syncing, and removal scripts:
Before (actual mkrepo step took 2 minutes 15 seconds) and after (actual mkrepo step took 3 seconds):
➡️
Before (actual sync dry-run step took 5 minutes 4 seconds) and after (actual sync dry-run step took 4 seconds):
➡️
GUS-W-15794884
For syncing, we now generate a "sync plan", from which operations that can be grouped together are then extracted, and executed in parallel using s5cmd run, in three groups:
all copies of new/changed dist archives - if anything fails during these, a re-run of the sync will perform them again;
all copies of new/changed manifests, as well as removals - if anything fails during these, only the failed ones will be performed again, including their dists;
all dist archives of removed manifests - worst case, there will be some stray dist archives left behind, but nothing picks them up because their manifests are gone.
GUS-W-15794884
As the "sync plan" is now generated separately by a script, we can have some tests that assert the operations are correct, dist archive URL rewrites from source to destination bucket happen properly, etc.
GUS-W-15794893
There are now local source and destination dir options for sync.sh, which will use the given directory of .composer.json manifests, instead of downloading them from the source or destination bucket.
What we can now do is, upon removal, fetch all manifests from the repo, except the given list/wildcards of manifests to remove, and then tell sync.sh to use that filtered directory of .composer.json files as the local source directory for syncing.
This moves all of the sanity checks, file deletion, error handling, confirmation etc logic into sync.sh, and lets us drop most of the logic in remove.sh. In addition, the core "sync plan" code of sync.sh is also covered by tests, which now extend to remove.sh.
GUS-W-15794899
Finally, there is currently a bit of a bug in remove.sh: when removing all files from a repository (e.g. for certain cleanups, testing, etc), there are no more .composer.json files left behind that the code can match, and the subsequent call to mkrepo.sh to re-generate the updated repository fails (as there are no files to pass to it), instead of either producing an empty repository file, or removing the repository file altogether.
Instead, the tooling now detects this situation, and removes packages.json as well, so that repositories can be fully removed.
s5cmd
can execute operations in parallel, which will massively speed up our repo generation, syncing, and removal scripts:Before (actual
mkrepo
step took 2 minutes 15 seconds) and after (actualmkrepo
step took 3 seconds): ➡️Before (actual
sync
dry-run step took 5 minutes 4 seconds) and after (actualsync
dry-run step took 4 seconds): ➡️GUS-W-15794884
For syncing, we now generate a "sync plan", from which operations that can be grouped together are then extracted, and executed in parallel using
s5cmd run
, in three groups:GUS-W-15794884
As the "sync plan" is now generated separately by a script, we can have some tests that assert the operations are correct, dist archive URL rewrites from source to destination bucket happen properly, etc.
GUS-W-15794893
There are now local source and destination dir options for
sync.sh
, which will use the given directory of.composer.json
manifests, instead of downloading them from the source or destination bucket.What we can now do is, upon removal, fetch all manifests from the repo, except the given list/wildcards of manifests to remove, and then tell
sync.sh
to use that filtered directory of.composer.json
files as the local source directory for syncing.This moves all of the sanity checks, file deletion, error handling, confirmation etc logic into
sync.sh
, and lets us drop most of the logic inremove.sh
. In addition, the core "sync plan" code ofsync.sh
is also covered by tests, which now extend toremove.sh
.GUS-W-15794899
Finally, there is currently a bit of a bug in
remove.sh
: when removing all files from a repository (e.g. for certain cleanups, testing, etc), there are no more.composer.json
files left behind that the code can match, and the subsequent call tomkrepo.sh
to re-generate the updated repository fails (as there are no files to pass to it), instead of either producing an empty repository file, or removing the repository file altogether.Instead, the tooling now detects this situation, and removes
packages.json
as well, so that repositories can be fully removed.GUS-W-15838870