Closed cmosig closed 2 years ago
NumPy 1.17.1, correct? What python version/platform. The problematic call looks to be from the cpython macro PyDateTime_IMPORT
in datetime.c
. Did this used to work?
Yes correct numpy version. Python version 3.6.8. Yes this used to work before. Also the linux version:
cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
I hope it helps. I there a way I could temporarly fix this problem?
What happens if you import matplotlib once in the process before starting your threads?
Or maybe import datetime; import matplotlib
As far as I can tell, this happens in numpy_pydatetime_import
when we call CPython's PyDateTime_IMPORT
. We call this quite late in the initialization code, in init_multiarray_umath
, which according to your log happens after OpenBLAS opens its threadpool. Perhaps moving the call earlier in the initialization code would help.
This might be due to a faulty python installation where datetime
is somehow broken, so making sure import datetime
works would be a first step
Without changing anything. I have run the actual script again and it did not fail immediatly. Instead I received only this error again:
OpenBLAS blas_thread_init: pthread_create failed for thread 60 of 64: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1028736 max
...
At the moment I cannot recreate the state where it would fail immediately with the exeception in the first comment. I assume the problem is something dynamic controlled by the os?
As far as I can tell, this happens in
numpy_pydatetime_import
when we call CPython'sPyDateTime_IMPORT
. We call this quite late in the initialization code, ininit_multiarray_umath
, which according to your log happens after OpenBLAS opens its threadpool. Perhaps moving the call earlier in the initialization code would help.This might be due to a faulty python installation where
datetime
is somehow broken, so making sureimport datetime
works would be a first stepOr maybe
import datetime; import matplotlib
All these imports are currently working fine.
You may be saturating your machine by running many processes. Importing numpy in each process will open a thread pool with the number of threads equal to number of CPUs, so if you have 64 cpus and open 100 processes that will be thousands of threads. You might be interested in threadpoolctl
to manage the number of threads each process opens
Hmm okay that makes sense. In my case this would be 88 process * 88 cpus = 7744 threads. I tried limiting the number of threads per cpu to one, but unfortunately this did not work. I received the same errors again and in the end I ended up in the state where numpy would immediately crash after import.
I am connected via ssh to a server where I run this code. What I noticed is that when closing the ssh connection and then connecting again, numpy does not crash immediately after import.
I tried limiting the number of threads per cpu to one, but unfortunately this did not work
Did you use threadpoolctl? Perhaps you could continue this part of the issue with them
I had the same problem using Jobe - a sandbox for running code. What worked was inserting these lines in the script before importing numpy.
import os
os.environ['OPENBLAS_NUM_THREADS'] = '1'
So yeah, looks like limiting the number of thread works.
Duplicate of https://github.com/numpy/numpy/issues/19145 and https://github.com/numpy/numpy/issues/17856. It has gotten better since this bug report. The conclusion was that there's isn't much more that OpenBLAS can do easily. I'll close this issue as a duplicate.
Still has this problem in 2022 with Python 3.9.0 and numpy 1.22.3 . Adding export OPENBLAS_NUM_THREADS=1
to .bashrc seems to solve the problem.
This issue is closed. Please open a new one with the error message. Is there anything out of the ordinary about this machine: does it have a large number of CPUs or is it lacking memory?
Yes, it is on an HPC with hundreds of nodes and 128 cores on each node. Do you mean export OPENBLAS_NUM_THREADS=1
is required in such a situation?
Hundred of nodes and 128 cores on each node is not a configuration we can test on, so stock NumPy (or OpenBLAS) will need some guidance on resource allocation. This may have improved in the 1.23 releases, but ultimately some strategy to allocate resources will be needed.
Does OPENBLAS_NUM_THREADS=1 have a heavy influence in performance?
btw still having this issue in a single node with 40 cores, python 3.11.4 + numpy 1.25.1, should I open an issue?
Nevermind it was an issue with Docker (seccomp not allowing pthread_create).
(seccomp not allowing pthread_create
It was caused by libseccomp
? Is there any related links about it?
I found the workaround in a bit of an unrelated conversation (https://github.com/HumanSignal/label-studio/issues/3070), the things is that apparently this does not affect current versions of Docker so I couldn't find out more information about this at the time.
I found the workaround in a bit of an unrelated conversation (HumanSignal/label-studio#3070), the things is that apparently this does not affect current versions of Docker so I couldn't find out more information about this at the time.
I use docker run --security-opt seccomp=unconfined
workaround, but it is not recommended in production env.
Reproducing code example:
Before this happened I was running a different python script, which used 89 processes. Unfortunately I cannot share that script publicly. Since then numpy crashes after import immediately.
Error message:
Numpy/Python version information:
numpy-1.7.1-13.el7.x86_64