Open RafaelGSS opened 4 months ago
It's not a new issue that CI tends to be flaky due to parallelization, for instance, tests that use more memory can cause OOM in a separate CI.
The only place this can happen is node-test-commit-linux-containered as those are running in containers. Everywhere else we are using one Jenkins executor per Agent/machine which can only run one job at a time.
And this is possibly the most flaky runner? Is there something we can do about it?
Also, it's fair to say that most of the flaky tests are in other jobs are unrelated to machine issues?
Hey folks,
It's not a new issue that CI tends to be flaky due to parallelization, for instance, tests that use more memory can cause OOM in a separate CI. Whenever a security release happens, we lock CI so only patches and proposals for the security release can trigger CI. The outcome is a fast CI and less likely to incur shared-resources errors.
Is there a place where we document the amount of
test-node-pull-request
that runs in parallel? Have we tried to reduce that number targeting more effective CIs?Note that if reducing the number of parallel CIs results in less flaky tests, it will greatly optimize development. Instead of running 4 CIs until they turn green, we only need to run one.