Closed ThomasHoppe closed 1 year ago
Installation with pip
is not supported (because the compiler situation is too difficult), you need to use mamba or conda.
@twiecki:
I reinstalled now pymc under conda, but the problem remains :-(
Operating System: Ubuntu 20.04.6 LTS Subsystem under Windows 10 WSL-2 PyMC installation via conda (miniconda)
Last updated: Tue Jul 25 2023
Python implementation: CPython Python version : 3.8.17 IPython version : 8.0.1
arviz : 0.15.1 numpy : 1.22.1 matplotlib: 3.7.1 scipy : 1.7.3 pandas : 2.0.2 pymc : 5.6.1
Watermark: 2.3.0
CompileError: Compilation failed (return status=1): /usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-avx512vpopcntdq -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include -I/home/thomas/miniconda3/envs/pymc/include/python3.8 -I/home/thomas/.local/lib/python3.8/site-packages/pytensor/link/c/c_code -L/home/thomas/miniconda3/envs/pymc/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.17-x86_64-3.8.17-64/tmph2y66i87/m80e56c88364a8e9d8553f659577acc090143a8d54f9ef83c7c12ac7eb91aecfa.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.17-x86_64-3.8.17-64/tmph2y66i87/mod.cpp -lpython3.8
g++: fatal error: Killed signal terminated program cc1plus compilation terminated.
Hm, it seems it's still using the system compile (/usr/bin/g++
), whereas it should use the compilers from the environment. Are you sure you activated the environment correctly? Also, can you post the outputs of: mamba list
and which g++
?
I am definitly sure that the environment was activated correctly. This python version is only used for pymc.
Here is the module list and the output of g++ -v:
That's not the output of which g++
.
which g++
gives /usr/bin/g++
This is what it shows for me:
>>which clang
clang is /Users/twiecki/micromamba/envs/pymc5/bin/clang
clang is /usr/bin/clang
You can see it has a compiler installed in my env which you lack, not sure why. But you can try to install it manually.
I installed clang outside and environment which clang
shows /usr/bin/clang.
Even if I install clang inside an env which clang
´still shows /usr/bin/clang.
But still I got
/home/thomas/.local/lib/python3.8/site-packages/pytensor/tensor/rewriting/elemwise.py:1019: UserWarning: Loop fusion failed because the resulting node would exceed the kernel argument limit. warn( Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag...
CompileError: Compilation failed (return status=1): /usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-avx512vpopcntdq -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include -I/home/thomas/miniconda3/envs/pymc5/include/python3.8 -I/home/thomas/.local/lib/python3.8/site-packages/pytensor/link/c/c_code -L/home/thomas/miniconda3/envs/pymc5/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.17-x86_64-3.8.17-64/tmpv9hkx7wj/m68ff8b8c4d606c4dd1f8fe6d6ebc5e974ecbc23ad2e3ca82f4d826e6f743dc44.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.17-x86_64-3.8.17-64/tmpv9hkx7wj/mod.cpp -lpython3.8 g++: fatal error: Killed signal terminated program cc1plus compilation terminated.
So /usr/bin/g++ is still called. Is there some additional configuration to do for switching to clang?
What I meant is that you need to install g++
from mamba into your environment. clang
is the compile I'm using on OSX instead of g++
. Something went wrong with your installation, you can also retry in a fresh env. Or try mamba install -c conda-forge gcc
.
Well, I made a clean install.
mamba create -n pymc
mamba activate pymc
mamba install gcc
mamba install pymc
(which also downgraded gcc from 13.1.0 to 12.3.0 and four other packages)which gcc
gives /home/thomas/mambaforge/envs/pymc/bin/gccwhich g++
gives /home/thomas/mambaforge/envs/pymc/bin/g++
follwed by the installation of jupyter notebook and supporting libs.Watermark now gives: Last updated: Wed Aug 02 2023
Python implementation: CPython Python version : 3.11.4 IPython version : 8.14.0
arviz : 0.16.1 pandas : 2.0.3 scipy : 1.11.1 matplotlib: 3.7.2 numpy : 1.25.1 pymc : 5.7.0
Watermark: 2.4.3
Again running the compiler-bug notebook gives after
/home/thomas/mambaforge/envs/pymc/lib/python3.11/site-packages/pytensor/tensor/rewriting/elemwise.py:1028: UserWarning: Loop fusion failed because the resulting node would exceed the kernel argument limit. warn( Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag...
the well-known compiler bug, but now with gcc from the env
CompileError: Compilation failed (return status=1): /home/thomas/mambaforge/envs/pymc/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mno-avx512f -mbmi -mbmi2 -maes -mpclmul -mno-avx512vl -mno-avx512bw -mno-avx512dq -mno-avx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mno-avx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mhle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mrtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mno-xsavec -mxsaveopt -mno-xsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mno-avx512fp16 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/mambaforge/envs/pymc/lib/python3.11/site-packages/numpy/core/include -I/home/thomas/mambaforge/envs/pymc/include/python3.11 -I/home/thomas/mambaforge/envs/pymc/lib/python3.11/site-packages/pytensor/link/c/c_code -L/home/thomas/mambaforge/envs/pymc/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.31-x86_64-3.11.4-64/tmpyelpazdx/m68ff8b8c4d606c4dd1f8fe6d6ebc5e974ecbc23ad2e3ca82f4d826e6f743dc44.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.31-x86_64-3.11.4-64/tmpyelpazdx/mod.cpp -lpython3.11 g++: fatal error: Killed signal terminated program cc1plus compilation terminated.
Since this used Python 3.11 and Pymc 5.7, I made a second attempt by downgrading Python to 3.8 and Pymc 3.6.1.
The paths to gcc and g++ are the same as above as well as the error.
So I think, it is not an issue with my installations.
Did you run the compiler-bug.ipynb yourself? Could you reproduce the behaviour?
Since the warning Loop fusion failed because the resulting node would exceed the kernel argument limit.
appears always, couldn't it be that the translation from the tensor-network to the c-code (atleast so far I understand that from the outside) produces some kind of "loop" for the compiler and that thus the compiler runs out of space?
Did you try the conda-forge channel specifically? mamba install -c conda-forge pymc
in a new environment.
Since the warning Loop fusion failed because the resulting node would exceed the kernel argument limit. appears always, couldn't it be that the translation from the tensor-network to the c-code (atleast so far I understand that from the outside) produces some kind of "loop" for the compiler and that thus the compiler runs out of space?
Can you try with a very simple model?
import pymc as pm
with pm.Model() as m:
x = pm.Normal()
pm.sample()
It is not clear for me if you see a problem with specific models or in general
mamba install -c conda-forge pymc
gives as output
Looking for: ['pymc']
conda-forge/noarch 13.5MB @ 4.0MB/s 3.7s conda-forge/linux-64 33.4MB @ 4.7MB/s 7.7s
Pinned packages:
Transaction
Prefix: /home/thomas/mambaforge/envs/pymc
All requested packages already installed
You should install from a fresh environment
It is the specific model of the notebook. As I explained at the beginning, a colleague of mine who authored this model has no problem at all.
All of my other models worked unter PyMC 5 (after some adaptations) without problem. Even the simple model: `import pymc as pm
with pm.Model() as m: x = pm.Normal("test") pm.sample()`
Runs as expected:
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [test]
100.00% [4000/4000 00:02<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 2 seconds. We recommend running at least 4 chains for robust computation of convergence diagnostics
So back to your case. After you install with conda-forge, can you try running a single chain? Just trying to narrow down the issue space
So back to your case. After you install with conda-forge, can you try running a single chain? Just trying to narrow down the issue space
Well, installed mamba install -c conda-forge pymc
in a fresh env test
,
Sampled with chains=1 as suggested:
with model_toto: trace_ = pm.sample(draws=nb_samples, chains=1, tune=tune)
Still got same behavior
CompileError: Compilation failed (return status=1): /home/thomas/mambaforge/envs/test/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mno-avx512f -mbmi -mbmi2 -maes -mpclmul -mno-avx512vl -mno-avx512bw -mno-avx512dq -mno-avx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mno-avx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mhle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mrtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mno-xsavec -mxsaveopt -mno-xsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mno-avx512fp16 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include -I/home/thomas/mambaforge/envs/test/include/python3.8 -I/home/thomas/.local/lib/python3.8/site-packages/pytensor/link/c/c_code -L/home/thomas/mambaforge/envs/test/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.10-x86_64-3.8.17-64/tmpi_iiqq0k/m68ff8b8c4d606c4dd1f8fe6d6ebc5e974ecbc23ad2e3ca82f4d826e6f743dc44.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.10-x86_64-3.8.17-64/tmpi_iiqq0k/mod.cpp -lpython3.8 g++: fatal error: Killed signal terminated program cc1plus compilation terminated.
Did you run the supplied notebook? How did it behave in your environment?
I think you might not have enough resources (RAM) so g++ is getting killed. E.g. https://github.com/soedinglab/hh-suite/issues/280
I increased the limit for the main storage to 10 GB and still the same error occured. Actually I can't believe that a compilation of roughly 8 MB of C-Code (compare the attached generated code file) cannot be done within 10 GB pytensor_compilation_error_1pmxatij.zip
@maresb could this be an arch issue?
No, this should be pure linux-64. This feels to me like a memory issue. Maybe the 10GB is not being made available somehow. I would check the output of free
, and then look in /var/log/syslog
for messages from the kernel's OOM-killer.
I increased the limit for the main storage to 10 GB and still the same error occured. Actually I can't believe that a compilation of roughly 8 MB of C-Code (compare the attached generated code file) cannot be done within 10 GB
@ThomasHoppe Not disk space but RAM.
If I say main memory, I do not talk about disc space. Im talking about 10GB of RAM ! The 10 GB are available. Take a look at the excerpt of the syslog.
I enclose also a video showing the last 6 minutes from 31 minutes of the call to pymc.sample where you can see from htop
and pmap
that the storage usage of cc1plus
increases within these 6 minutes from rougly 2GB to more than 10GB.
@ThomasHoppe I misunderstood. Then it's definitely not the RAM. I'm a bit stumped, because it's not a compiler error but the compiler getting killed.
State of the bug isolation:
-I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include
which is definitly an inclusion path outside the used mamba environment. Could this be the reason?@ThomasHoppe I didn't have time to look at your model before. I believe the source of the problem is that you have a very inefficient model. You are doing a series of operations per row of data, which builds a very large latent graph. You can probably vectorize your operations using advanced indexing, which will make the computational graph of the model much simpler and shorter to compile.
Here is how I would write your last model (probably has bugs!!!):
#import sklearn.preprocessing
model_toto = pm.Model()
with model_toto:
score = pm.Normal("score", tau=1., mu=0., shape=nb_clubs)
advantage_defence_diff = pm.Normal("offence_defence_diff",
tau=1., mu=1.5, shape=nb_clubs)
# number of goals scored more at home as away
home_advantage = pm.Normal("home_advantage", tau=10., mu=.0)
# softmax regression weights for winner predicton:
weights = pm.Normal("weights", mu=(0., .25, -0.25), tau=100., shape=(3))
heim = np.array([hg[0] for hg in home_goals_])
gast = np.array([hg[1] for hg in home_goals_])
h_goals = np.array([hg[2] for hg in home_goals_])
heim_ = np.array([ag[0] for hg in away_goals_])
gast_ = np.array([ag[1] for hg in away_goals_])
a_goals = np.array([ag[2] for hg in away_goals_])
s_h_, add_h = score[heim], advantage_defence_diff[heim]
s_g, add_g = score[gast], advantage_defence_diff[gast]
s_h = s_h_ + home_advantage
offence_heim = s_h + add_h
defence_heim = s_h - add_h
offence_gast = s_g + add_g
defence_gast = s_g - add_g
home_value = offence_heim - defence_gast
away_value = offence_gast - defence_heim
score_diff = s_h-s_g # can be negative!
### no negative values
home_value = pm.math.switch(pm.math.lt(home_value, 0.), low, home_value)
away_value = pm.math.switch(pm.math.lt(away_value, 0.), low, away_value)
# for prediction of the winner
toto = np.where(
h_goals == a_goals,
0,
np.where(
h_goals > a_goals,
1,
2
),
)
mu_home = pm.Deterministic("home_rate", home_value)
pm.Poisson("home_goals", observed=home_goals, mu=mu_home)
mu_away = pm.Deterministic("away_rate", away_value)
pm.Poisson("away_goals", observed=away_goals, mu=mu_away)
ha_diff = score_diff
ha_diff = ha_diff.reshape((-1,1))
ha_diff = ha_diff.repeat(3, axis=1)
pred = pm.math.exp(ha_diff * weights)
pred = (pred.T/pm.math.sum(pred, axis=1)).T
pm.Categorical('toto', p=pred, observed=toto)
Those index and numerical operations are vectorized just like numpy, and your model won't grow exponentially in complexity with your data size.
@ThomasHoppe I didn't have time to look at your model before. I believe the source of the problem is that you have a very inefficient model. You are doing a series of operations per row of data, which builds a very large latent graph. You can probably vectorize your operations using advanced indexing, which will make the computational graph of the model much simpler and shorter to compile.
@ricardoV94: Thanks, for the suggestion. Actually, the model was designed by a colleague, who has no problems running it. He does not encounter the compiler problem. I also found that the iterative solution wouldn't be ideal, but hadn't the time diving deeper into it, without a running reference solution. Your seems to me quite plausible and we will give it a try ...
Let us know if it works. If not, the right place to continue this discussion would be on discourse: https://discourse.pymc.io/
Regarding your colleague, even if he could manage to compile, I am certain the model will be considerably slower the way he wrote it down. I'll close this issue in the meantime, as it's not clear it would be worth the trouble to try and make the compiler more robust to very large graphs.
Describe the issue:
During compilation of models compiler receives a kill signal (reason unknown). Can be reproduced with two different models.
Reproduceable code example:
Error message:
PyMC version information:
Occured in 5.5.0 and 5.6.1
Detailed watermark:
Last updated: Tue Jul 18 2023
Python implementation: CPython Python version : 3.8.10 IPython version : 8.0.1
arviz : 0.15.1 pandas : 2.0.2 daft : 0.1.2 pymc : 5.6.1 matplotlib: 3.7.1 numpy : 1.22.1 scipy : 1.7.3 pytensor: 2.12.3
Watermark: 2.3.0
Operating System: Ubuntu 20.04.6 LTS Subsystem under Windows 10 WSL-2 PyMC installation via pip
Context for the issue:
Stops further evaluation of the model with sample_posterior_prediction
D1.csv compiler-bug.zip