hyperledger / caliper-benchmarks

Sample benchmark files for Hyperledger Caliper https://wiki.hyperledger.org/display/caliper
https://hyperledger.github.io/caliper-benchmarks/
Apache License 2.0
113 stars 120 forks source link

Optimize the execution of Fabric CI tests #169

Closed aklenik closed 2 years ago

aklenik commented 2 years ago

The following enhancements should be considered to lower the execution time of CI tests:

  1. Perform CI tests only on the PR side (running again on merges doesn't seem to add extra value)
  2. Parallelize CI tests based on the chaincode too, not just its language (or just based on chaincode, which would mean one CI test per benchmark/workload execution)
    1. The current script could be broken down into smaller ones (one network and channel creation, and one for each CC deployment)
    2. CI feedback would be (probably) faster and easier to interpret, failed tests would directly correspond to a chaincode or its workload
    3. Such fine-grained execution would enable additional optimization, like running tests only when their corresponding source files have changed (using, for example, this: https://github.com/marketplace/actions/file-changes-action)
davidkel commented 2 years ago

For 1, I wonder if it's down to this ?

  push:
    branches: [ main ]

I just based it off of the starter actions you can generate and also I was trying to work out how to actually get the github action to run. It appears that it can't come from a PR request from your local repo and only worked if it was a branch on the main repo itself (needed Ry's help for that) which kind of makes it difficult for non-maintainers to contribute to this.

for 2 we can parallelize more but I guess it depends on whether there are any limits in github actions to this (given that fabric seem to be having all sorts of issues now with azure pipelines due to parallelisation and so builds not working). Currently the builds only test node chaincode but we will need the benchmarks to test all chaincode languages to ensure the chaincode implementation is complete and correct. Definitely be good to use file-changes to drive different builds.

aklenik commented 2 years ago

It appears that it can't come from a PR request from your local repo and only worked if it was a branch on the main repo itself (needed Ry's help for that) which kind of makes it difficult for non-maintainers to contribute to this.

I don't really understand this. Isn't this an issue only for the initial creation of a workflow (since the workflow is not picked up yet for the intended target branch)? Or do I misunderstand the issue?

(Btw, have you seen this? https://github.com/nektos/act Looks promising if we want to "complicate" things in the workflow and test it quickly.)

  1. Yeah, removing that should solve the double run.
  2. I'm wondering whether we should maintain the Fabric chaincodes in this repository. I mean the fabric-samples repo also contains the chaincodes. And we don't really want to test the chaincodes here, we want to test their workload implementation. Then it's enough if we use a single language per chaincode in the tests.
davidkel commented 2 years ago

To summarise

  1. the builds currently only build fabric tests
  2. fabric has several different sets (do we want them all ? this will be covered a bit later)
  3. We should only run tests on files that have changed. If a change a benchmark file or add a benchmark file, then it should be that one that the build runs. If I change a workload module, then the associated benchmark file should be run. If I change the chaincode then we need to deploy that chaincode to ensure it is built, but then which benchmark should be run to test it ? I think there needs to be a default benchmark file that does simple tests of chaincode. I think that is the case for most of the benchmarks (they either have only a single benchmark file or there is a test benchmark file)
  4. relook at the parallelisation of the runs currently they are split purely by chaincode language, but should the be split by the sample itself ?

To the question of the chaincodes supported by this repo we have for fabric

  1. fixed-asset (a fabric network set of benchmarks using the contract api in go, java, node)
  2. fixed-asset-base (a fabric-network set of benchmarks not using the contract api in go, node)
  3. fabcar (go, java, node)
  4. marbles (go, node)
  5. marbles-norichquery (go, node)
  6. simple (go, node)
  7. smallbank (go)

fixed-asset and fixed-asset-base exist to try to benchmark a fabric network topology outside of a chaincode implementation (ie it allows you to benchmark fabric itself) which is really handy when looking for deployment bottlenecks, determining theoretical tps max and to help performance tuning the fabric code itself. This should remain and in fact needs to be enhanced so requires another set of issues

The others are samples in a similar vein to fabric-samples so we could look to drop those and create new benchmarks on some of the fabric samples, for example the token implementations. I see this as a separate issue to this one however but it would still be optimised based on only running if a change to benchmarks is made (but we would need to rely on a nighly build to see if something changed in fabric-samples that would warrant our attention)

The other thing I would note about fabric samples is that

  1. they aren't always reliable and a lot of them aren't tested in a build
  2. There has been a lot of flux within the repo recently which we don't get if we own the chaincode ourselves
davidkel commented 2 years ago

The implementation of this issue should also result in a demonstration of the solution working before the implementer given that it would require different types of PR to show it working