mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

Hang on M1 macos #235

Closed jeroen closed 3 years ago

jeroen commented 3 years ago

When I run CMD check on an apple arm64 machine, it hangs at the unit tests:

* checking compilation flags in Makevars ... OK
* checking for GNU extensions in Makefiles ... OK
* checking for portable use of $(BLAS_LIBS) and $(LAPACK_LIBS) ... OK
* checking use of PKG_*FLAGS in Makefiles ... OK
* checking compiled code ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘testthat.R’

Where it is indefinitely stalled. The contents of testthat.Rout don't tell me much more:

> library(testthat)
> test_check("clustermq")
Loading required package: clustermq
* Option 'clustermq.scheduler' not set, defaulting to 'LOCAL'
--- see: https://mschubert.github.io/clustermq/articles/userguide.html#configuration

Any guess what is going wrong?

mschubert commented 3 years ago

Can you make test instead of CMD check to get a bit more resolution about where the hang is? (& if it's always on the same test)

jeroen commented 3 years ago
> test()
Loading clustermq
* Option 'clustermq.scheduler' not set, defaulting to ‘LOCAL’
--- see: https://mschubert.github.io/clustermq/articles/userguide.html#configuration
Testing clustermq
✔ |  OK F W S | Context
✔ |   2       | bindings
✔ |   5       | util
✔ |   5     1 | zeromq [0.6 s]
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Skip (test-0-zeromq.r:21:5): node hack works
Reason: has_connectivity(Sys.info()["nodename"]) is not TRUE
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
⠏ |   0       | check_args
jeroen commented 3 years ago

Perhaps this is just a side effect of the networking on my M1 server and not related to arm64 🤷‍♂️

mschubert commented 3 years ago

Thank you for the report and clarification. I am very confused about this.

First, I thought that you might have hit some socket polling without incoming message that I missed to time out, but the check_args tests are pure R with no networking involved.

So maybe a hang on the gc collecting the previous test objects, with the context destroy hanging because the socket was not properly closed?

Can you check if the hang location changes with you change test-0-zeromq.r#L21 to skip("") unconditionally instead?

In this case, I should move over the remaining testing util to the newer ZeroMQ C++ API.

mschubert commented 3 years ago

This has started hanging on CI as well, I think because of an incorrect zmq context termination if the local network connection can not be established.

Should be fixed in develop, please reopen if this is not the case