hyperledger / cacti

Hyperledger Cacti is a new approach to the blockchain interoperability problem
https://wiki.hyperledger.org/display/cactus
Apache License 2.0
324 stars 277 forks source link

ci(performance): speed up CI - dynamic diff analysis #2364

Open petermetz opened 1 year ago

petermetz commented 1 year ago

Description

As a maintainer/contributor I want to have the CI finish in half an hour or less so that I'm not waiting 3 hours (or in some cases days) for the CI to finish running on my pull request.

Related task: https://github.com/hyperledger/cacti/issues/1567 Related task: https://github.com/hyperledger/cacti/issues/2117

Alternative solutions considered:

  1. Auto-scaling self-hosted runners as a service (aka BuildJet) is expensive
  2. Hosting our own (static, non-autoscaled) self-hosted runners has it's own issues where the runners get stuck or just go OOM and need manual hand-holding all the time according to Ry
  3. Writing and deploying our own auto-scaling self hosted runners as a service thingy - sounds like a fun project, but definitely out of scope, too much work/time/risk...

Acceptance Criteria

Implement a custom script (./tools/... that populates the GitHub CI workflow action yaml context with data about the diff that can be leveraged with well crafted if conditions within the yaml files such that:

  1. If the diff contains changes to a leaf package that no other package is depending on, then only that package is to be tested by the CI, everything else can be skipped.
  2. If the changes are documentation only - no test execution happens at all
  3. Supports the Typescript/NodeJS packages
  4. Supports the Container image builds
  5. Supports the newly added asset exchange tests
  6. The speedup is such that a documentation change should have the CI finished in 5 minutes
  7. The speedup is such that a code change in a leaf package should have the CI finished in about 15 minutes or less
  8. If a top level package (such as the common or core-api packages) are being changed then the CI will still run for a long time because those packages will trigger the test execution for all other packages that depend on them and this cascades down all the way to the leaf packages.
jagpreetsinghsasan commented 1 year ago

Currently I am using the js-dependency-extractor to generate the dependency graph. Also I had to hard code the test-tooling -> ghcr job mapping (as there is no pattern on either sides)

jagpreetsinghsasan commented 1 year ago

@petermetz suggestion on this

Yeah, what we need to do is attach some parseable metadata to the job definitions themselves to decouple the job name from the package name (so that they don't have to match). So that the test-tooling job can be called anything, but then the YAML object that defines it has some key like x-pkg-name or something that we parse and then identify/associate the job with the correct package no matter what. This will be needed when we further optimize the CI later on where some packages I will want to break up into multiple jobs to make the test execution more parallelized. A good example of this is the fabric and the corda connectors. Their test cases take almost an hour to run so we have to have multiple jobs for those so that the test cases can run much faster but then we won't be able to use the same job name for these jobs as the package name because it will have to be something unique.

petermetz commented 10 months ago

Note to self: There's also this released by GitHub in the meantime(?) => https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#onpushpull_requestpull_request_targetpathspaths-ignore