Open mattwthompson opened 1 week ago
This seems to be an intermittent error, but I don't know where it's coming from. It probably has something to do with the fact that all the notebooks are run in parallel here, which speeds things up a lot even though the runners don't technically have enough cores. I've experimented with extending the timeout but its already much longer than it should need to be. What if I tried to set it up so that cell timeout errors did not cause the CI to fail?
That seems like an okay band-aid; if the cause is something as silly as getting worse hardware, it would reduce this noise. But if there was a regression introduced in a new release that legitimately makes some common process goofily slow, it would go undetected.
If a few extra CPU cores would be helpful, what about running on better hardware? Either provided (reliable and expensive) by GitHub or by our new tools that hook into AWS?
https://github.com/openforcefield/openff-docs/actions/runs/10755472185
Snippet of log is below. I'm not seeing these failures in my own nightly CI, so perhaps something is configured differently here? (https://github.com/openforcefield/openff-docs/issues/53 ?)
P.S. I'm still getting daily email about these runs - could updating the notification flow be looked into? I'm not the best person to figure out if these failures are genuine or if there's a change in the config, automation, etc.