mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

Cannot allocate buffer when running HPC jobs in singularity container #224

Closed mattwarkentin closed 3 years ago

mattwarkentin commented 3 years ago

Hi @mschubert,

I'm getting an error that I'm not convinced is clustermqs fault, but I was wondering if you have any thoughts/insight into this error and what it might be indicating?

Error in private$zmq$send(data, sid, dont_wait, send_more) : 
  Evaluation error: cannot allocate buffer.
Calls: <Anonymous> -> <Anonymous> -> <Anonymous> -> .External
Execution halted

I don't really know how to work backward and figure out what is throwing this error.

For context, this is running on HPC using Slurm as the scheduler, and the worker processes are running inside singularity containers.

mschubert commented 3 years ago

Failed memory allocations are usually OOM. Do you maybe have some limits set on the container that you're hitting?

Otherwise no idea, and I'll need a lot more to go on.

mattwarkentin commented 3 years ago

I'm working through various debugging techniques. I will close this for now and come back if I have anything more useful to share.