nodejs / build

Better build and test infra for Node.
502 stars 165 forks source link

Reduce the number of parallel CIs to reduce flakiness #3835

Open RafaelGSS opened 1 month ago

RafaelGSS commented 1 month ago

Hey folks,

It's not a new issue that CI tends to be flaky due to parallelization, for instance, tests that use more memory can cause OOM in a separate CI. Whenever a security release happens, we lock CI so only patches and proposals for the security release can trigger CI. The outcome is a fast CI and less likely to incur shared-resources errors.

Is there a place where we document the amount of test-node-pull-request that runs in parallel? Have we tried to reduce that number targeting more effective CIs?

Note that if reducing the number of parallel CIs results in less flaky tests, it will greatly optimize development. Instead of running 4 CIs until they turn green, we only need to run one.

richardlau commented 1 month ago

It's not a new issue that CI tends to be flaky due to parallelization, for instance, tests that use more memory can cause OOM in a separate CI.

The only place this can happen is node-test-commit-linux-containered as those are running in containers. Everywhere else we are using one Jenkins executor per Agent/machine which can only run one job at a time.

RafaelGSS commented 1 month ago

And this is possibly the most flaky runner? Is there something we can do about it?

Also, it's fair to say that most of the flaky tests are in other jobs are unrelated to machine issues?