heroku / heroku-buildpack-php

Heroku's buildpack for PHP applications.
https://devcenter.heroku.com/categories/php
MIT License
808 stars 1.59k forks source link

Port all binary build tooling to s5cmd, add sync tests, simplify removals #716

Closed dzuelke closed 5 months ago

dzuelke commented 6 months ago

s5cmd can execute operations in parallel, which will massively speed up our repo generation, syncing, and removal scripts:

Before (actual mkrepo step took 2 minutes 15 seconds) and after (actual mkrepo step took 3 seconds): Screenshot 2024-05-28 at 13 05 46       ➡️  Screenshot 2024-05-28 at 13 06 03

Before (actual sync dry-run step took 5 minutes 4 seconds) and after (actual sync dry-run step took 4 seconds): Screenshot 2024-05-23 at 16 34 29  ➡️  Screenshot 2024-05-23 at 16 34 39

GUS-W-15794884


For syncing, we now generate a "sync plan", from which operations that can be grouped together are then extracted, and executed in parallel using s5cmd run, in three groups:

  1. all copies of new/changed dist archives - if anything fails during these, a re-run of the sync will perform them again;
  2. all copies of new/changed manifests, as well as removals - if anything fails during these, only the failed ones will be performed again, including their dists;
  3. all dist archives of removed manifests - worst case, there will be some stray dist archives left behind, but nothing picks them up because their manifests are gone.

GUS-W-15794884


As the "sync plan" is now generated separately by a script, we can have some tests that assert the operations are correct, dist archive URL rewrites from source to destination bucket happen properly, etc.

GUS-W-15794893


There are now local source and destination dir options for sync.sh, which will use the given directory of .composer.json manifests, instead of downloading them from the source or destination bucket.

What we can now do is, upon removal, fetch all manifests from the repo, except the given list/wildcards of manifests to remove, and then tell sync.sh to use that filtered directory of .composer.json files as the local source directory for syncing.

This moves all of the sanity checks, file deletion, error handling, confirmation etc logic into sync.sh, and lets us drop most of the logic in remove.sh. In addition, the core "sync plan" code of sync.sh is also covered by tests, which now extend to remove.sh.

GUS-W-15794899


Finally, there is currently a bit of a bug in remove.sh: when removing all files from a repository (e.g. for certain cleanups, testing, etc), there are no more .composer.json files left behind that the code can match, and the subsequent call to mkrepo.sh to re-generate the updated repository fails (as there are no files to pass to it), instead of either producing an empty repository file, or removing the repository file altogether.

Instead, the tooling now detects this situation, and removes packages.json as well, so that repositories can be fully removed.

GUS-W-15838870