BUG: g++: fatal error: Killed signal terminated program cc1plus

ThomasHoppe commented 1 year ago

Describe the issue:

During compilation of models compiler receives a kill signal (reason unknown). Can be reproduced with two different models.

Reproduceable code example:

Code example is longer, see attached notebook and data file below

Error message:

CompileError: Compilation failed (return status=1):
/usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-avx512vpopcntdq -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include -I/usr/include/python3.8 -I/home/thomas/.local/lib/python3.8/site-packages/pytensor/link/c/c_code -L/usr/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.29-x86_64-3.8.10-64/tmpqc37cguk/m80e56c88364a8e9d8553f659577acc090143a8d54f9ef83c7c12ac7eb91aecfa.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.29-x86_64-3.8.10-64/tmpqc37cguk/mod.cpp -lpython3.8
/home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.29-x86_64-3.8.10-64/tmpqc37cguk/mod.cpp: In member function ‘int {anonymous}::__struct_compiled_op_m80e56c88364a8e9d8553f659577acc090143a8d54f9ef83c7c12ac7eb91aecfa::run()’:
/home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.29-x86_64-3.8.10-64/tmpqc37cguk/mod.cpp:5249:13: note: variable tracking size limit exceeded with ‘-fvar-tracking-assignments’, retrying without
 5249 |         int run(void) {
      |             ^~~
g++: fatal error: Killed signal terminated program cc1plus
compilation terminated.

PyMC version information:

Occured in 5.5.0 and 5.6.1

Detailed watermark:

Last updated: Tue Jul 18 2023

Python implementation: CPython Python version : 3.8.10 IPython version : 8.0.1

arviz : 0.15.1 pandas : 2.0.2 daft : 0.1.2 pymc : 5.6.1 matplotlib: 3.7.1 numpy : 1.22.1 scipy : 1.7.3 pytensor: 2.12.3

Watermark: 2.3.0

Operating System: Ubuntu 20.04.6 LTS Subsystem under Windows 10 WSL-2 PyMC installation via pip

Context for the issue:

Stops further evaluation of the model with sample_posterior_prediction

D1.csv compiler-bug.zip

twiecki commented 1 year ago

Installation with pip is not supported (because the compiler situation is too difficult), you need to use mamba or conda.

ThomasHoppe commented 1 year ago

@twiecki:

I reinstalled now pymc under conda, but the problem remains :-(

Operating System: Ubuntu 20.04.6 LTS Subsystem under Windows 10 WSL-2 PyMC installation via conda (miniconda)

Last updated: Tue Jul 25 2023

Python implementation: CPython Python version : 3.8.17 IPython version : 8.0.1

arviz : 0.15.1 numpy : 1.22.1 matplotlib: 3.7.1 scipy : 1.7.3 pandas : 2.0.2 pymc : 5.6.1

Watermark: 2.3.0

CompileError: Compilation failed (return status=1): /usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-avx512vpopcntdq -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include -I/home/thomas/miniconda3/envs/pymc/include/python3.8 -I/home/thomas/.local/lib/python3.8/site-packages/pytensor/link/c/c_code -L/home/thomas/miniconda3/envs/pymc/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.17-x86_64-3.8.17-64/tmph2y66i87/m80e56c88364a8e9d8553f659577acc090143a8d54f9ef83c7c12ac7eb91aecfa.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.17-x86_64-3.8.17-64/tmph2y66i87/mod.cpp -lpython3.8

g++: fatal error: Killed signal terminated program cc1plus compilation terminated.

twiecki commented 1 year ago

Hm, it seems it's still using the system compile (/usr/bin/g++), whereas it should use the compilers from the environment. Are you sure you activated the environment correctly? Also, can you post the outputs of: mamba list and which g++?

ThomasHoppe commented 1 year ago

I am definitly sure that the environment was activated correctly. This python version is only used for pymc.

Here is the module list and the output of g++ -v:

conda_list.txt g++-version.txt

twiecki commented 1 year ago

That's not the output of which g++.

ThomasHoppe commented 1 year ago

which g++ gives /usr/bin/g++

twiecki commented 1 year ago

This is what it shows for me:

>>which clang
clang is /Users/twiecki/micromamba/envs/pymc5/bin/clang
clang is /usr/bin/clang

You can see it has a compiler installed in my env which you lack, not sure why. But you can try to install it manually.

ThomasHoppe commented 1 year ago

I installed clang outside and environment which clang shows /usr/bin/clang. Even if I install clang inside an env which clang ´still shows /usr/bin/clang.

But still I got

/home/thomas/.local/lib/python3.8/site-packages/pytensor/tensor/rewriting/elemwise.py:1019: UserWarning: Loop fusion failed because the resulting node would exceed the kernel argument limit. warn( Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag...

CompileError: Compilation failed (return status=1): /usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-avx512vpopcntdq -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include -I/home/thomas/miniconda3/envs/pymc5/include/python3.8 -I/home/thomas/.local/lib/python3.8/site-packages/pytensor/link/c/c_code -L/home/thomas/miniconda3/envs/pymc5/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.17-x86_64-3.8.17-64/tmpv9hkx7wj/m68ff8b8c4d606c4dd1f8fe6d6ebc5e974ecbc23ad2e3ca82f4d826e6f743dc44.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.17-x86_64-3.8.17-64/tmpv9hkx7wj/mod.cpp -lpython3.8 g++: fatal error: Killed signal terminated program cc1plus compilation terminated.

So /usr/bin/g++ is still called. Is there some additional configuration to do for switching to clang?

twiecki commented 1 year ago

What I meant is that you need to install g++ from mamba into your environment. clang is the compile I'm using on OSX instead of g++. Something went wrong with your installation, you can also retry in a fresh env. Or try mamba install -c conda-forge gcc.

ThomasHoppe commented 1 year ago

Well, I made a clean install.

I installed mamba under Linux as described on https://mamba.readthedocs.io/en/latest/installation.html from mambaforge. Then:
mamba create -n pymc
mamba activate pymc
mamba install gcc
mamba install pymc (which also downgraded gcc from 13.1.0 to 12.3.0 and four other packages)
which gcc gives /home/thomas/mambaforge/envs/pymc/bin/gcc
which g++ gives /home/thomas/mambaforge/envs/pymc/bin/g++ follwed by the installation of jupyter notebook and supporting libs.

Watermark now gives: Last updated: Wed Aug 02 2023

Python implementation: CPython Python version : 3.11.4 IPython version : 8.14.0

arviz : 0.16.1 pandas : 2.0.3 scipy : 1.11.1 matplotlib: 3.7.2 numpy : 1.25.1 pymc : 5.7.0

Watermark: 2.4.3

Again running the compiler-bug notebook gives after

/home/thomas/mambaforge/envs/pymc/lib/python3.11/site-packages/pytensor/tensor/rewriting/elemwise.py:1028: UserWarning: Loop fusion failed because the resulting node would exceed the kernel argument limit. warn( Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag...

the well-known compiler bug, but now with gcc from the env

CompileError: Compilation failed (return status=1): /home/thomas/mambaforge/envs/pymc/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mno-avx512f -mbmi -mbmi2 -maes -mpclmul -mno-avx512vl -mno-avx512bw -mno-avx512dq -mno-avx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mno-avx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mhle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mrtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mno-xsavec -mxsaveopt -mno-xsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mno-avx512fp16 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/mambaforge/envs/pymc/lib/python3.11/site-packages/numpy/core/include -I/home/thomas/mambaforge/envs/pymc/include/python3.11 -I/home/thomas/mambaforge/envs/pymc/lib/python3.11/site-packages/pytensor/link/c/c_code -L/home/thomas/mambaforge/envs/pymc/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.31-x86_64-3.11.4-64/tmpyelpazdx/m68ff8b8c4d606c4dd1f8fe6d6ebc5e974ecbc23ad2e3ca82f4d826e6f743dc44.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.31-x86_64-3.11.4-64/tmpyelpazdx/mod.cpp -lpython3.11 g++: fatal error: Killed signal terminated program cc1plus compilation terminated.

Since this used Python 3.11 and Pymc 5.7, I made a second attempt by downgrading Python to 3.8 and Pymc 3.6.1.

The paths to gcc and g++ are the same as above as well as the error.

So I think, it is not an issue with my installations.

Did you run the compiler-bug.ipynb yourself? Could you reproduce the behaviour?

Since the warning Loop fusion failed because the resulting node would exceed the kernel argument limit. appears always, couldn't it be that the translation from the tensor-network to the c-code (atleast so far I understand that from the outside) produces some kind of "loop" for the compiler and that thus the compiler runs out of space?

ricardoV94 commented 1 year ago

Did you try the conda-forge channel specifically? mamba install -c conda-forge pymc in a new environment.

ricardoV94 commented 1 year ago

Since the warning Loop fusion failed because the resulting node would exceed the kernel argument limit. appears always, couldn't it be that the translation from the tensor-network to the c-code (atleast so far I understand that from the outside) produces some kind of "loop" for the compiler and that thus the compiler runs out of space?

Can you try with a very simple model?

import pymc as pm

with pm.Model() as m:
  x = pm.Normal()
  pm.sample()

It is not clear for me if you see a problem with specific models or in general

ThomasHoppe commented 1 year ago

mamba install -c conda-forge pymc gives as output

Looking for: ['pymc']

conda-forge/noarch 13.5MB @ 4.0MB/s 3.7s conda-forge/linux-64 33.4MB @ 4.7MB/s 7.7s

Pinned packages:

python 3.8.*

Transaction

Prefix: /home/thomas/mambaforge/envs/pymc

All requested packages already installed

ricardoV94 commented 1 year ago

You should install from a fresh environment

ThomasHoppe commented 1 year ago

It is the specific model of the notebook. As I explained at the beginning, a colleague of mine who authored this model has no problem at all.

All of my other models worked unter PyMC 5 (after some adaptations) without problem. Even the simple model: `import pymc as pm

with pm.Model() as m: x = pm.Normal("test") pm.sample()`

Runs as expected:

Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [test]

100.00% [4000/4000 00:02<00:00 Sampling 2 chains, 0 divergences]

Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 2 seconds. We recommend running at least 4 chains for robust computation of convergence diagnostics

ricardoV94 commented 1 year ago

So back to your case. After you install with conda-forge, can you try running a single chain? Just trying to narrow down the issue space

ThomasHoppe commented 1 year ago

So back to your case. After you install with conda-forge, can you try running a single chain? Just trying to narrow down the issue space

Well, installed mamba install -c conda-forge pymc in a fresh env test, Sampled with chains=1 as suggested:

with model_toto: trace_ = pm.sample(draws=nb_samples, chains=1, tune=tune)

Still got same behavior

CompileError: Compilation failed (return status=1): /home/thomas/mambaforge/envs/test/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -march=broadwell -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mno-avx512f -mbmi -mbmi2 -maes -mpclmul -mno-avx512vl -mno-avx512bw -mno-avx512dq -mno-avx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mno-avx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mhle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mrtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mno-xsavec -mxsaveopt -mno-xsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mno-avx512fp16 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include -I/home/thomas/mambaforge/envs/test/include/python3.8 -I/home/thomas/.local/lib/python3.8/site-packages/pytensor/link/c/c_code -L/home/thomas/mambaforge/envs/test/lib -fvisibility=hidden -o /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.10-x86_64-3.8.17-64/tmpi_iiqq0k/m68ff8b8c4d606c4dd1f8fe6d6ebc5e974ecbc23ad2e3ca82f4d826e6f743dc44.so /home/thomas/.pytensor/compiledir_Linux-5.15-microsoft-standard-WSL2-x86_64-with-glibc2.10-x86_64-3.8.17-64/tmpi_iiqq0k/mod.cpp -lpython3.8 g++: fatal error: Killed signal terminated program cc1plus compilation terminated.

Did you run the supplied notebook? How did it behave in your environment?

twiecki commented 1 year ago

I think you might not have enough resources (RAM) so g++ is getting killed. E.g. https://github.com/soedinglab/hh-suite/issues/280

ThomasHoppe commented 1 year ago

I increased the limit for the main storage to 10 GB and still the same error occured. Actually I can't believe that a compilation of roughly 8 MB of C-Code (compare the attached generated code file) cannot be done within 10 GB pytensor_compilation_error_1pmxatij.zip

twiecki commented 1 year ago

@maresb could this be an arch issue?

maresb commented 1 year ago

No, this should be pure linux-64. This feels to me like a memory issue. Maybe the 10GB is not being made available somehow. I would check the output of free, and then look in /var/log/syslog for messages from the kernel's OOM-killer.

ThomasHoppe commented 1 year ago

I increased the limit for the main storage to 10 GB and still the same error occured. Actually I can't believe that a compilation of roughly 8 MB of C-Code (compare the attached generated code file) cannot be done within 10 GB

twiecki commented 1 year ago

@ThomasHoppe Not disk space but RAM.

ThomasHoppe commented 1 year ago

If I say main memory, I do not talk about disc space. Im talking about 10GB of RAM ! The 10 GB are available. Take a look at the excerpt of the syslog.

I enclose also a video showing the last 6 minutes from 31 minutes of the call to pymc.sample where you can see from htopand pmap that the storage usage of cc1plus increases within these 6 minutes from rougly 2GB to more than 10GB.

syslog-htop-pmap-video.zip

twiecki commented 1 year ago

@ThomasHoppe I misunderstood. Then it's definitely not the RAM. I'm a bit stumped, because it's not a compiler error but the compiler getting killed.

ThomasHoppe commented 1 year ago

State of the bug isolation:

Clean install in separate mamba environment (hence dependencies should be correct)
Excluded RAM restrictions
Found increased memory consumption of the compiler in the last 6 minutes of the 30 minutes process during compilation
Remaining confounder for the compiler behavior are 1) until now unnoticed bug in the compiler itself 2) a configuration issue of the parameters used to call the compiler 3) the generated C-Code which depends on PyTensor

considering the frequent usage of GCC, it is not very probable that such a compiler bug wasn't found yet
this is a possibility I cannot exclude. If I inspect the compiler parameters in the error message of Aug, 2. I see -I/home/thomas/.local/lib/python3.8/site-packages/numpy/core/include which is definitly an inclusion path outside the used mamba environment. Could this be the reason?
As computer scientist I would conclude, that the trouble is more likely caused by the generated C-Code, which causes the compiler in one way or the other to allocate more and more memory.

ricardoV94 commented 1 year ago

@ThomasHoppe I didn't have time to look at your model before. I believe the source of the problem is that you have a very inefficient model. You are doing a series of operations per row of data, which builds a very large latent graph. You can probably vectorize your operations using advanced indexing, which will make the computational graph of the model much simpler and shorter to compile.

ricardoV94 commented 1 year ago

Here is how I would write your last model (probably has bugs!!!):

#import sklearn.preprocessing
model_toto = pm.Model()

with model_toto:
    score = pm.Normal("score", tau=1., mu=0., shape=nb_clubs)
    advantage_defence_diff = pm.Normal("offence_defence_diff", 
                            tau=1., mu=1.5, shape=nb_clubs)

    # number of goals scored more at home as away
    home_advantage = pm.Normal("home_advantage", tau=10., mu=.0)

    # softmax regression weights for winner predicton:
    weights = pm.Normal("weights", mu=(0., .25, -0.25), tau=100., shape=(3))    

    heim = np.array([hg[0] for hg in home_goals_])
    gast = np.array([hg[1] for hg in home_goals_])
    h_goals = np.array([hg[2] for hg in home_goals_])

    heim_ = np.array([ag[0] for hg in away_goals_])
    gast_ = np.array([ag[1] for hg in away_goals_])
    a_goals = np.array([ag[2] for hg in away_goals_])

    s_h_, add_h = score[heim], advantage_defence_diff[heim]
    s_g, add_g = score[gast], advantage_defence_diff[gast]

    s_h = s_h_ + home_advantage

    offence_heim = s_h + add_h
    defence_heim = s_h - add_h
    offence_gast = s_g + add_g
    defence_gast = s_g - add_g

    home_value = offence_heim - defence_gast
    away_value = offence_gast - defence_heim

    score_diff = s_h-s_g # can be negative!

    ### no negative values
    home_value = pm.math.switch(pm.math.lt(home_value, 0.), low, home_value)
    away_value = pm.math.switch(pm.math.lt(away_value, 0.), low, away_value)

    # for prediction of the winner
    toto = np.where(
        h_goals == a_goals,
        0,
        np.where(
            h_goals > a_goals,
            1,
            2
        ),
    )

    mu_home = pm.Deterministic("home_rate", home_value)
    pm.Poisson("home_goals", observed=home_goals, mu=mu_home)

    mu_away = pm.Deterministic("away_rate", away_value)
    pm.Poisson("away_goals", observed=away_goals, mu=mu_away)

    ha_diff = score_diff
    ha_diff = ha_diff.reshape((-1,1))
    ha_diff = ha_diff.repeat(3, axis=1)  

    pred = pm.math.exp(ha_diff * weights)
    pred = (pred.T/pm.math.sum(pred, axis=1)).T
    pm.Categorical('toto', p=pred, observed=toto)

Those index and numerical operations are vectorized just like numpy, and your model won't grow exponentially in complexity with your data size.

ThomasHoppe commented 1 year ago

@ThomasHoppe I didn't have time to look at your model before. I believe the source of the problem is that you have a very inefficient model. You are doing a series of operations per row of data, which builds a very large latent graph. You can probably vectorize your operations using advanced indexing, which will make the computational graph of the model much simpler and shorter to compile.

@ricardoV94: Thanks, for the suggestion. Actually, the model was designed by a colleague, who has no problems running it. He does not encounter the compiler problem. I also found that the iterative solution wouldn't be ideal, but hadn't the time diving deeper into it, without a running reference solution. Your seems to me quite plausible and we will give it a try ...

ricardoV94 commented 1 year ago

Let us know if it works. If not, the right place to continue this discussion would be on discourse: https://discourse.pymc.io/

Regarding your colleague, even if he could manage to compile, I am certain the model will be considerably slower the way he wrote it down. I'll close this issue in the meantime, as it's not clear it would be worth the trouble to try and make the compiler more robust to very large graphs.

pymc-devs / pymc