GP Training on a worker pool sometimes hangs indefinitely; seems dependent on conda environment #101

Closed stevetorr closed 1 year ago

stevetorr commented 4 years ago

Depending on murky and uncertain Conda package configurations it appears that sometimes the multiprocessing module hangs effectively indefinitely.

Ultimately Lixin and I were able to isolate it down to my conda environment. Mine below was:

_libgcc_mutex             0.1                        main  
asn1crypto                0.24.0                   py36_0  
atomate                   0.7.5                    py36_0    matsci
atomicwrites              1.3.0                    py36_1  
attrs                     19.1.0                   py36_1  
backcall                  0.1.0                    py36_0  
bcrypt                    3.1.7            py36h7b6447c_0  
blas                      1.0                         mkl  
bleach                    3.1.0                    py36_0  
ca-certificates           2019.1.23                     0  
certifi                   2019.3.9                 py36_0  
cffi                      1.12.3           py36h2e261b9_0  
chardet                   3.0.4                 py36_1003  
click                     7.0                      py36_0  
cryptography              2.6.1            py36h1ba5d50_0  
custodian                 2018.8.10                py36_0    matsci
cycler                    0.10.0                   py36_0  
dbus                      1.13.6               h746ee38_0  
decorator                 4.4.0                    py36_1  
defusedxml                0.6.0                      py_0  
dnspython                 1.16.0                   py36_0  
entrypoints               0.3                      py36_0  
expat                     2.2.6                he6710b0_0  
fastcache                 1.1.0            py36h7b6447c_0  
fireworks                 1.7.2                    py36_1    matsci
flask                     1.1.1                      py_0  
flask-paginate            0.5.1                    py36_0    matsci
fontconfig                2.13.0               h9420a91_0  
freetype                  2.9.1                h8a8886c_1  
glib                      2.56.2               hd408876_0  
gmp                       6.1.2                h6c8ec71_1  
gmpy2                     2.0.8            py36h10f8cd9_2  
gst-plugins-base          1.14.0               hbbd80ab_1  
gstreamer                 1.14.0               hb453b48_1  
gunicorn                  19.9.0                   py36_1    matsci
icu                       58.2                 h9c2bf20_1  
idna                      2.8                      py36_0  
importlib_metadata        0.19                     py36_0  
intel-openmp              2019.3                      199  
ipykernel                 5.1.2            py36h39e3cac_0  
ipython                   7.7.0            py36h39e3cac_0  
ipython_genutils          0.2.0                    py36_0  
ipywidgets                7.5.1                      py_0  
itsdangerous              1.1.0                    py36_0  
jedi                      0.15.1                   py36_0  
jinja2                    2.10.1                   py36_0  
jpeg                      9b                   h024ee3a_2  
jsonschema                3.0.2                    py36_0  
jupyter                   1.0.0                    py36_7  
jupyter_client            5.3.1                      py_0  
jupyter_console           6.0.0                    py36_0  
jupyter_core              4.5.0                      py_0  
kiwisolver                1.1.0            py36he6710b0_0  
latexcodec                1.0.5                    py36_0    matsci
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 8.2.0                hdf63c60_1  
libgfortran-ng            7.3.0                hdf63c60_0  
libpng                    1.6.37               hbc83047_0  
libsodium                 1.0.16               h1bed415_0  
libstdcxx-ng              8.2.0                hdf63c60_1  
libuuid                   1.0.3                h1bed415_2  
libxcb                    1.13                 h1bed415_1  
libxml2                   2.9.9                hea5a465_1  
llvmlite                  0.29.0           py36hd408876_0  
markupsafe                1.1.1            py36h7b6447c_0  
matplotlib                3.1.1            py36h5429711_0  
memory-profiler           0.55.0                    <pip>
mistune                   0.8.4            py36h7b6447c_0  
mkl                       2019.3                      199  
mkl_fft                   1.0.10           py36ha843d7b_0  
mkl_random                1.0.2            py36hd81dba3_0  
monty                     2.0.4                    py36_1    matsci
more-itertools            7.2.0                    py36_0  
mpc                       1.1.0                h10f8cd9_1  
mpfr                      4.0.1                hdf1c602_3  
mpmath                    1.1.0                    py36_0  
nbconvert                 5.5.0                      py_0  
nbformat                  4.4.0                    py36_0  
ncurses                   6.1                  he6710b0_1  
notebook                  6.0.0                    py36_0  
numba                     0.45.1           py36h962f231_0  
numpy                     1.16.2           py36h7e9f1db_0  
numpy-base                1.16.2           py36hde5b4d6_0  
openssl                   1.1.1b               h7b6447c_1  
packaging                 19.1                     py36_0  
palettable                3.1.1                    py36_2    matsci
pandas                    0.25.1           py36he6710b0_0  
pandoc                                 0  
pandocfilters             1.4.2                    py36_1  
paramiko                  2.6.0                    py36_0  
parso                     0.5.1                      py_0  
pcre                      8.43                 he6710b0_0  
pexpect                   4.7.0                    py36_0  
pickleshare               0.7.5                    py36_0  
pip                       19.0.3                   py36_0  
pluggy                    0.12.0                     py_0  
prometheus_client         0.7.1                      py_0  
prompt_toolkit            2.0.9                    py36_0  
psutil                    5.6.3                     <pip>
ptyprocess                0.6.0                    py36_0  
py                        1.8.0                    py36_0  
pybtex                    0.21                     py36_0    matsci
pycparser                 2.19                     py36_0  
pydash                    4.7.5                     <pip>
pydispatcher              2.0.5            py36h30c4b39_1    matsci
pygments                  2.4.2                      py_0  
pymatgen                  2019.4.11                py36_0    matsci
pymatgen-diffusion        2018.1.4                 py36_0    matsci
pymongo                   3.9.0            py36he6710b0_0  
pynacl                    1.3.0            py36h7b6447c_0  
pyopenssl                 19.0.0                   py36_0  
pyparsing                 2.4.2                      py_0  
pyqt                      5.9.2            py36h05f1152_2  
pyrsistent                0.14.11          py36h7b6447c_0  
pysocks                   1.7.0                    py36_0  
pytest                    5.0.1                    py36_0  
python                    3.6.8                h0371630_0  
python-dateutil           2.8.0                    py36_0  
pytz                      2019.2                     py_0  
pyyaml                    5.1.2            py36h7b6447c_0  
pyzmq                     18.1.0           py36he6710b0_0  
qt                        5.9.7                h5867ecd_1  
qtconsole                 4.5.4                      py_0  
readline                  7.0                  h7b6447c_5  
requests                  2.22.0                   py36_0  
ruamel.yaml               0.15.91                  py36_0    matsci
scipy                     1.2.1            py36h7c811a0_0  
send2trash                1.5.0                    py36_0  
setuptools                40.8.0                   py36_0  
sip                       4.19.8           py36hf484d3e_0  
six                       1.12.0                   py36_0  
spglib                    1.12.2.post0     py36h39e3cac_0    matsci
sqlite                    3.27.2               h7b6447c_0  
sympy                     1.4                      py36_0  
tabulate                  0.8.2                    py36_2    matsci
tbb                       2019.4               hfd86e86_0  
terminado                 0.8.2                    py36_0  
testpath                  0.4.2                    py36_0  
tk                        8.6.8                hbc83047_0  
tornado                   6.0.3            py36h7b6447c_0  
tqdm                      4.36.1                     py_0  
traitlets                 4.3.2                    py36_0  
urllib3                   1.24.2                   py36_0  
wcwidth                   0.1.7                    py36_0  
webencodings              0.5.1                    py36_1  
werkzeug                  0.16.0                     py_0  
wheel                     0.33.1                   py36_0  
widgetsnbextension        3.5.1                    py36_0  
xz                        5.2.4                h14c3975_4  
yaml                      0.1.7                had09818_2  
zeromq                    4.3.1                he6710b0_3  
zipp                      0.5.2                      py_0  
zlib                      1.2.11               h7b6447c_3  

Hers was:

_libgcc_mutex             0.1                        main  
ase                       3.18.0                    <pip>
atomicwrites              1.3.0                     <pip>
attrs                     19.1.0                    <pip>
blas                      1.0                         mkl  
ca-certificates           2019.5.15                     1  
certifi                   2019.6.16                py36_1  
Click                     7.0                       <pip>
cycler                    0.10.0                    <pip>
Cython                    0.29.13                   <pip>
dill                               <pip>
Flask                     1.1.1                     <pip>
future                    0.17.1                    <pip>
h5py                      2.10.0                    <pip>
importlib-metadata        0.22                      <pip>
intel-openmp              2019.4                      243  
itsdangerous              1.1.0                     <pip>
Jinja2                    2.10.1                    <pip>
Keras-Applications        1.0.8                     <pip>
Keras-Preprocessing       1.1.0                     <pip>
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
MarkupSafe                1.1.1                     <pip>
matplotlib                3.1.1                     <pip>
mkl                       2019.4                      243  
mkl-service               2.0.2            py36h7b6447c_0  
mkl_fft                   1.0.14           py36ha843d7b_0  
mkl_random                1.0.2            py36hd81dba3_0  
more-itertools            7.2.0                     <pip>
mpi4py                    3.0.2                     <pip>
mpmath                    1.1.0                     <pip>
multiprocess              0.70.9                    <pip>
ncurses                   6.1                  he6710b0_1  
nose                      1.3.7                     <pip>
numba                     0.45.1                    <pip>
numpy                     1.17.2                    <pip>
numpy                     1.16.4           py36h7e9f1db_0  
numpy-base                1.16.4           py36hde5b4d6_0  
openssl                   1.1.1c               h7b6447c_1  
packaging                 19.1                      <pip>
pathos                    0.2.5                     <pip>
pip                       19.1.1                   py36_0  
pip                       19.2.3                    <pip>
pluggy                    0.13.0                    <pip>
pox                       0.2.7                     <pip>
ppft                               <pip>
psutil                    5.6.3                     <pip>
py                        1.8.0                     <pip>
Pympler                   0.7                       <pip>
pyparsing                 2.4.2                     <pip>
pytest                    5.1.2                     <pip>
python                    3.6.9                h265db76_0  
python-dateutil           2.8.0                     <pip>
readline                  7.0                  h7b6447c_5  
scikit-learn              0.21.3                    <pip>
scipy                     1.3.1                     <pip>
setuptools                41.0.1                   py36_0  
six                       1.12.0                   py36_0  
sklearn                   0.0                       <pip>
soaplite                  1.0.3                     <pip>
sqlite                    3.29.0               h7b6447c_0  
sympy                     1.4                       <pip>
tensorboard               1.14.0                    <pip>
tensorflow                1.14.0                    <pip>
tensorflow-estimator      1.14.0                    <pip>
tk                        8.6.8                hbc83047_0  
wcwidth                   0.1.7                     <pip>
Werkzeug                  0.15.5                    <pip>
wheel                     0.33.4                   py36_0  
xz                        5.2.4                h14c3975_4  
zipp                      0.6.0                     <pip>
zlib                      1.2.11               h7b6447c_3  
nw13slx commented 4 years ago

Similar issue is observed again on Travis. https://travis-ci.org/github/mir-group/flare/builds/677271523 . It seems any line with np.zeros or np array without declaring dtype=np.float64 or dtype=np.int32 may hang the unit tests...

Below is the list of libraries and version numpy(1.15.4) scipy(1.4.1) memory_profiler(0.57.0) numba(0.49.0) ase(3.19.1) pymatgen(2020.4.2) [nptyping(1.0.1) psutil(5.7.0) llvmlite 0.33.0.dev0, setuptools(40.8.0) matplotlib (3.2.1) sympy(1.5.1) spglib (1.15.0) tabulate(0.8.7) ruamel.yaml (0.16.10) pydispatcher (2.0.5) pandas(1.0.3) networkx (2.4) requests(2.23.0) plotly (4.6.0) monty (3.0.2) palettable (3.3.0) typish (1.6.0) pyparsing (2.4.7) kiwisolver (1.2.0) cycler (0.10.0) python-dateutil (2.8.1) mpmath (1.1.0) ruamel.yaml.clib 0.2.0) pytz (2019.3) decorator (4.4.2) idna (2.9) urllib3 (1.25.9) certifi (2018.11.29) chardet (3.0.4) six(1.11.0) retrying (1.3.3)

YuuuXie commented 1 year ago

