Closed tchoutri closed 2 years ago
While a base
bump would be sufficient for building the library, the tests and benchmarks are actually blocked upstream, see:
@phadej intended to release updated cryptohash-sha1
soon, https://github.com/haskell-hvr/cryptohash-md5/pull/7#issuecomment-948372848, so maybe we'd rather not to rush with revisions.
A status update that I'm a bit distracted as I'd like to get benchmarks to build with GHC-9.2 too. That means following the aeson & criterion dependencies first. And there I'm waiting for unordered-containers
hashable-1.4
support to not need to do a second sweep.
EDIT: Yes I know I could swtitch to e.g. tasty-bench
, but I'd still need to update aeson
, so it wouldn't save total work for me.
@phadej Okay, thank you for the explanations
The next blocker is https://github.com/actions/runner/issues/1326 GHA seems to unreliably fail, very often. I run out of ideas what might be the cause. (old GHCs like 7.6 are not resource hungry). It doesn't make much sense to restart jobs if 4 out of 12 fail. (Single job restart would help, it's on roadmap, but there is no ETA).
I don't know if @emilypi or other HF folks (@Bodigrim) have any GitHub contacts to point out to that issue. I don't feel very confident updating the libs. Also GHC-8.0 job fails, and I'd rather not drop GHC support just because CI is unreliable.
Hi @phadej - is there a way to get more verbosity or diagnostics out of that failure? It looks like it is failing in a dependency installation? Could this be a networking problem? I apologize that I'm not very familiar with Haskell, could you tell me where these assets are being downloaded from?
They are already downloaded. At that point, e.g. in https://github.com/haskell-hvr/cryptohash-md5/runs/4191743221?check_suite_focus=true
Starting criterion-measurement-0.1.3.0 (lib)
Building criterion-measurement-0.1.3.0 (lib)
Installing criterion-measurement-0.1.3.0 (lib)
Completed criterion-measurement-0.1.3.0 (lib)
Installing statistics-0.15.2.0 (lib)
Completed statistics-0.15.2.0 (lib)
Starting criterion-1.5.11.0 (lib)
Building criterion-1.5.11.0 (lib)
Error: The operation was canceled.
everything is already downloaded. There are lines like
Downloading base-orphans-0.8.6
Downloaded base-orphans-0.8.6
Downloading call-stack-0.4.0
Downloaded call-stack-0.4.0
above in the same step.
Let me try to make a variant which is a bit more explicit about what is happening. However then I need to reduce the parallelism from 2 to 1, and I suspect that then the job might actually succeed then.
So I doubt it's a networking problem. In aeson
failing jobs (like https://github.com/haskell/aeson/runs/4186746204?check_suite_focus=true) the failing step is an actual library build
step. So it's not a dependency installation problem either.
The only sure thing is that compiling aeson
and criterion
is resource intensive, but as describe in the issue linked it doesn't seem that process are running out of memory, because in these cases runtime system would report the failure and job would fail gracefully. However, this jobs are slow. It almost feels like memory (or/and cpu?) hungry jobs are stripped of CPU power, so they are simply slow and actually timeout.
Thanks for the information. We're investigating.
@ethomson looks like this https://github.com/haskell-hvr/cryptohash-md5/runs/4192987633?check_suite_focus=true job is going to fail. It will have a bit more output (and timestamps for build tool output (not the compilers))
@phadej we made a few changes to some VM configuration. Can I ask you to queue a new build?
Very useful data. We changed the VM configuration as a test for runs in the haskell-hvr
org. We did not change haskellari
. This gives us more confidence that the VM configuration changes will be effective here. Working on applying this more broadly...
The only sure thing is that compiling
aeson
andcriterion
is resource intensive, but as describe in the issue linked it doesn't seem that process are running out of memory, because in these cases runtime system would report the failure and job would fail gracefully. However, this jobs are slow. It almost feels like memory (or/and cpu?) hungry jobs are stripped of CPU power, so they are simply slow and actually timeout.
I recently observed the same issue with jobs more resource intensive than usual: they tend to run up to 30 minutes and then either succeed quickly or come to a halt until 360 minute timeout. Hard to debug in detail, the jobs in question are on s390x
emulated machine. Emulation is terribly expensive and likely to eat all 7 Gb RAM available to workers, could it be an effect of swapping? I'd still expect them to finish in an hour at max, and local tests show no signs of freezing.
https://github.com/haskell/bytestring/actions/workflows/s390x.yml
@phadej and @bodigrim we've rolled out the configuration to 100% - if you could please retry any jobs that were failing and let me know if you're still seeing problems!
@ethomson i queued few jobs and they all passed. Thanks!
@ethomson much better now, thanks!
Someone likes binary here...
@ethomson Thank you very much for the help!!
Same as https://github.com/haskell-infra/hackage-trustees/issues/313 for GHC 9.2 and base-4.16.