wlav / cppyy

Other
387 stars 39 forks source link

Jupyter and command line cppyy import statements failing in opposite environments #161

Open sophiehourihane opened 1 year ago

sophiehourihane commented 1 year ago

I have found that, in the same environment on a linux computer, the cppyy bindings that I have written work either from the command line or from an ipython kernel.

When I pip-install cppyy, it works perfectly from the command line, but when I attempt identical import statements inside of a jupyter notebook I get:

>>> cppyy.include(os.path.join(header_file_directory, header_file))
Failed to load header file "/home/sophie.hourihane/src/bayeswave-cpp/src/runs/run_builder.hpp"
In file included from input_line_71:1:
In file included from /home/sophie.hourihane/src/bayeswave-cpp/src/runs/run_builder.hpp:27:
In file included from /home/sophie.hourihane/.conda/envs/pip_cppyy_env/include/lal/LALInference.h:86:
In file included from /usr/include/sys/time.h:25:
/usr/include/bits/types/struct_timeval.h:8:8: error: redefinition of 'timeval'
struct timeval
       ^
/home/sophie.hourihane/.conda/envs/pip_cppyy_env/bin/../x86_64-conda-linux-gnu/sysroot/usr/include/bits/time.h:75:8: note: previous definition is here
struct timeval
       ^

as well as errors like

/home/sophie.hourihane/.conda/envs/pip_cppyy_env/bin/../lib/gcc/../../x86_64-conda-linux-gnu/include/c++/11.3.0/bits/stl_iterator.h:1646:5: note: candidate template ignored: could not match 'move_iterator<type-parameter-0-0>' against 'long'
    operator+(typename move_iterator<_Iterator>::difference_type __n,
    ^
/home/sophie.hourihane/.conda/envs/pip_cppyy_env/bin/../lib/gcc/../../x86_64-conda-linux-gnu/include/c++/11.3.0/bits/basic_string.h:6095:5: note: candidate template ignored: could not match 'basic_string' against '__normal_iterator'
    operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
    ^
/home/sophie.hourihane/.conda/envs/pip_cppyy_env/bin/../lib/gcc/../../x86_64-conda-linux-gnu/include/c++/11.3.0/bits/basic_string.h:6132:5: note: candidate template ignored: could not match 'basic_string' against '__normal_iterator'
    operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
    ^

On the other hand, when I use cppyy as installed through conda-forge, (version 2.4.2), I am able to import the headers in Jupyter almost without errors, (although when importing cppyy in Jupyter I do get):

ERROR in cling::CIFactory::createCI(): cannot extract standard library include paths!
Invoking:
  LC_ALL=C x86_64-conda-linux-gnu-c++  -O3 -DNDEBUG -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.*include/,${' -e '/^ \/.*include/p' -e '}'
Results was:
With exit code 0

Importing cppyy in the command line in this environment does not result in an error.

On my local machine (OSX), I can run cppyy in both a Jupyter env and command line fine (With pip-installed cppyy)

Let me know if there is any more information I can provide.

wlav commented 1 year ago

Cling uses this command LC_ALL=C x86_64-conda-linux-gnu-c++ -O3 -DNDEBUG -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.*include/,${' -e '/^ \/.*include/p' -e '}' to find the standard headers. I'm surprised it fails under conda-installed cppyy, as that's supposed to bring the conda compilers as well. Can you see whether x86_64-conda-linux-gnu-c++ exists in the conda environment? And/or whether which g++ points to the conda C++ compiler?

As for the pip case, you can see that there is a mix of system C++ headers (found by Cling) and conda C++ headers (found by, my guess, pre-build dictionaries). I suspect that when the pip install of cppyy happened, the C++ run-time (libraries and headers) was available in conda, but not actual C++ compiler, and that the versions of conda and system C++ differ in incompatible ways. Installing the conda C++ compiler before pip installing cppyy should fix that.

sophiehourihane commented 1 year ago

Cling uses this command LC_ALL=C x86_64-conda-linux-gnu-c++ -O3 -DNDEBUG -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.*include/,${' -e '/^ \/.*include/p' -e '}' to find the standard headers. I'm surprised it fails under conda-installed cppyy, as that's supposed to bring the conda compilers as well. Can you see whether x86_64-conda-linux-gnu-c++ exists in the conda environment? And/or whether which g++ points to the conda C++ compiler?

c++ is in this conda environment: x86_64-conda-linux-gnu-c++ and g++ is also installed in the environment x86_64-conda-linux-gnu-g++ I made a fresh conda env from the same yaml file and now I cannot import cppyy into the conda env (with the same errors as I was getting in the pip environment). This version does successfully run in the command line.

ERROR in cling::CIFactory::createCI(): cannot extract standard library include paths!
Invoking:
  LC_ALL=C x86_64-conda-linux-gnu-c++  -O3 -DNDEBUG -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.*include/,${' -e '/^ \/.*include/p' -e '}'
Results was:
With exit code 0

Header file causing problem is /home/sophie.hourihane/src/bayeswave-cpp/src/runs/run_builder.hpp
Failed to load header file "/home/sophie.hourihane/src/bayeswave-cpp/src/runs/run_builder.hpp"
In file included from input_line_71:1:
In file included from /home/sophie.hourihane/src/bayeswave-cpp/src/runs/run_builder.hpp:27:
In file included from /home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/include/lal/LALInference.h:86:
In file included from /usr/include/sys/time.h:25:
/usr/include/bits/types/struct_timeval.h:8:8: error: redefinition of 'timeval'
struct timeval
       ^
/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/bin/../x86_64-conda-linux-gnu/sysroot/usr/include/bits/time.h:75:8: note: previous definition is here
struct timeval
       ^

Is there perhaps a path or environment variable that is getting added in the jupyter notebook that could be causing this issue?

As for the pip case, you can see that there is a mix of system C++ headers (found by Cling) and conda C++ headers (found by, my guess, pre-build dictionaries). I suspect that when the pip install of cppyy happened, the C++ run-time (libraries and headers) was available in conda, but not actual C++ compiler, and that the versions of conda and system C++ differ in incompatible ways. Installing the conda C++ compiler before pip installing cppyy should fix that.

Because I installed cppyy into the environment after I installed the c++, I would expect that I have done that. Is there an order I should be installing cppyy-cling? (e.g. before or after cppyy?)

sophiehourihane commented 1 year ago

When I run:

import cppyy
print(cppyy.gbl.gInterpreter.GetIncludePath())

In Jupyter (conda installed cppyy fresh) I get

ERROR in cling::CIFactory::createCI(): cannot extract standard library include paths!
Invoking:
  LC_ALL=C x86_64-conda-linux-gnu-c++  -O3 -DNDEBUG -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.*include/,${' -e '/^ \/.*include/p' -e '}'
Results was:
With exit code 0
-I"/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/lib/python3.10/site-packages/cppyy_backend/etc/" -I"/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/lib/python3.10/site-packages/cppyy_backend/etc//cling" -I"/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/lib/python3.10/site-packages/cppyy_backend/include/" -I"/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/include" -I"/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/include/python3.10" -I"/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/lib/python3.10/site-packages/../../../include/python3.10"

Whereas in the command line python (where I can import perfectly) I get:

-isysroot "/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/x86_64-conda-linux-gnu/sysroot" -I"/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/lib/python3.10/site-packages/cppyy_backend/etc/" -I"/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/lib/python3.10/site-packages/cppyy_backend/etc//cling" -I"/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/lib/python3.10/site-packages/cppyy_backend/include/" -I"/opt/intel/oneapi/vpl/2022.2.0/include" -I"/opt/intel/oneapi/tbb/2021.7.0/env/../include" -I"/opt/intel/oneapi/mkl/2022.2.0/include" -I"/opt/intel/oneapi/ipp/2021.6.1/include" -I"/opt/intel/oneapi/ippcp/2021.6.1/include" -I"/opt/intel/oneapi/ipp/2021.6.1/include" -I"/opt/intel/oneapi/dpl/2021.7.1/linux/include" -I"/opt/intel/oneapi/dpcpp-ct/2022.2.0/include" -I"/opt/intel/oneapi/dnnl/2022.2.0/cpu_dpcpp_gpu_dpcpp/include" -I"/opt/intel/oneapi/dev-utilities/2021.7.0/include" -I"/opt/intel/oneapi/dal/2021.7.0/include" -I"/opt/intel/oneapi/ccl/2021.7.0/include/cpu_gpu_dpcpp" -I"/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/include" -I"/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/include/python3.10" -I"/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/lib/python3.10/site-packages/../../../include/python3.10"
wlav commented 1 year ago

What does executing:

LC_ALL=C x86_64-conda-linux-gnu-c++  -O3 -DNDEBUG -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.*include/,${' -e '/^ \/.*include/p' -e '}'

on the command line in the conda environment show?

And then in jupyter the same using:

import os
print(os.system("""LC_ALL=C x86_64-conda-linux-gnu-c++  -O3 -DNDEBUG -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.*include/,${' -e '/^ \/.*include/p' -e '}'"""))

If the second doesn't print anything, I suspect that x86_64-conda-linux-gnu-c++ isn't accessible anymore. In that case, is jupyter itself installed through conda or pip, or maybe even on the system?

sophiehourihane commented 1 year ago

Command line:

 /opt/intel/oneapi/vpl/2022.2.0/include
 /opt/intel/oneapi/tbb/2021.7.0/env/../include
 /opt/intel/oneapi/mkl/2022.2.0/include
 /opt/intel/oneapi/ipp/2021.6.1/include
 /opt/intel/oneapi/ippcp/2021.6.1/include
 /opt/intel/oneapi/dpl/2021.7.1/linux/include
 /opt/intel/oneapi/dpcpp-ct/2022.2.0/include
 /opt/intel/oneapi/dnnl/2022.2.0/cpu_dpcpp_gpu_dpcpp/include
 /opt/intel/oneapi/dev-utilities/2021.7.0/include
 /opt/intel/oneapi/dal/2021.7.0/include
 /opt/intel/oneapi/ccl/2021.7.0/include/cpu_gpu_dpcpp
 /opt/intel/oneapi/clck/2021.7.0/include
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/bin/../lib/gcc/x86_64-conda-linux-gnu/11.3.0/include
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/bin/../lib/gcc/x86_64-conda-linux-gnu/11.3.0/include-fixed
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/bin/../lib/gcc/x86_64-conda-linux-gnu/11.3.0/../../../../x86_64-conda-linux-gnu/include
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/bin/../lib/gcc/../../x86_64-conda-linux-gnu/include/c++/11.3.0
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/bin/../lib/gcc/../../x86_64-conda-linux-gnu/include/c++/11.3.0/x86_64-conda-linux-gnu
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/bin/../lib/gcc/../../x86_64-conda-linux-gnu/include/c++/11.3.0/backward
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/bin/../x86_64-conda-linux-gnu/sysroot/usr/include

In jupyter

import os
print(os.system("""LC_ALL=C x86_64-conda-linux-gnu-c++  -O3 -DNDEBUG -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.*include/,${' -e '/^ \/.*include/p' -e '}'"""))

returns:

0

Further, when I do os.system("which jupyter") it returns /opt/conda3/bin/jupyter

whereas in the command line which jupyter returns ~/.conda/envs/bayeswave-cpp-fresh/bin/jupyter

I suppose it makes sense because I am using a jupyter server hosted by my collaboration. I would assume that just by adding my environment as a kernel it would know how to pick up these paths but it looks like it does not. Do you have any recommendations for exposing these paths to cppyy? Preferably in a nice way but a hacky way would also be OK.

wlav commented 1 year ago

Sorry for being dense, but picking up on this "I am using a jupyter server hosted by my collaboration." to make sure we're on the same page: the jupyter server is running on a different computer than the conda environment? If yes, is the file system, that contains the conda environment, shared? (I'm not familiar with the mechanics of jupyter servers and whether/how the local environment is being copied over. But for that to work, I'd figure that at minimum the file system needs to be shared as that's where the conda compiler lives.)

Can you also try this in jupyter:

import cppyy_backend
print(cppyy_backend)

The point being that that will show whether jupyter sees your local conda cppyy, or some other install.

sophiehourihane commented 1 year ago

I am honestly iffy about the way the server works. I know that it is hosted on a different computer with a shared file system to the (also remote) computer that I am using. doing

 import cppyy_backend
print(cppyy_backend)

returns

<module 'cppyy_backend' from '/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/lib/python3.10/site-packages/cppyy_backend/__init__.py'>

I also found that I can use the jupyter magic functions like so

CONDA_PATH = "/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/bin"
JUPYTER_PATH = %env PATH
%env PATH = $CONDA_PATH:$JUPYTER_PATH

And then my imports work like normal (hooray!). I am not sure why the kernel is not automatically added to the path, but this seems to work ok.

sophiehourihane commented 9 months ago

After having a lot of success running cppyy in jupyter I am stumped about why this code is causing a crash (it works perfectly in the command line)

CONDA_PATH = "/home/sophie.hourihane/.conda/envs/bayeswave-cpp/bin"
JUPYTER_PATH = %env PATH
%env PATH = $CONDA_PATH:$JUPYTER_PATH

import cppyy.gbl as Cpp
import cppyy
import cppyy.ll
cppyy.ll.set_signals_as_exception(True)

# Import the C++ namespaces
cppyy.include('random')

# Create a std::mt19937 object
print('creating rng', flush=True)
number = 42
rng = cppyy.gbl.std.mt19937(number)  # You can seed it with a specific value (e.g., 42)
print('done creating rng', flush=True)

# Generate a random number
random_number = rng()

This code gets stuck at generating a rng. The same code works perfectly fine in the command line.

Killing the cell returns an error like this

*** Break *** illegal instruction

Thread 12 (Thread 0x7f95c9ffb700 (LWP 1224023)):
#0  0x00007f95df0ae848 in pthread_cond_timedwait

GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000056124e9b18cd in PyCOND_TIMEDWAIT (us=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, mut=<optimized out>, cond=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /home/conda/feedstock_root/build_artifacts/python-split_1687559129017/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/tupleobject.c:73
#2  take_gil (tstate=0x5612507fdc40) at /usr/local/src/conda/python-3.10.12/Python/string3.h:256
#3  0x000056124e9f5332 in PyEval_RestoreThread (tstate=0x5612507fdc40) at /usr/local/src/conda/python-3.10.12/Modules/ceval_gil.h:547
#4  0x000056124eaddeb9 in pysleep (secs=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.10.12/Include/cpython/find.h:2077
#5  time_sleep (self=<optimized out>, obj=<optimized out>) at /usr/local/src/conda/python-3.10.12/Include/cpython/find.h:370
#6  0x000056124e9d9dfa in cfunction_vectorcall_O (func=0x7f95de23fd80, args=0x7f95d8415668, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.10.12/Objects/pycore_bitutils.h:516
#7  0x000056124e9cf142 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f95d8415668, callable=0x7f95de23fd80, tstate=0x5612507fdc40) at /usr/local/src/conda/python-3.10.12/Objects/pycore_pyerrors.h:123
#8  PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f95d8415668, callable=0x7f95de23fd80) at /usr/local/src/conda/python-3.10.12/Objects/pycore_pyerrors.h:123

Some additional information: Running

import os
print(os.system("""LC_ALL=C x86_64-conda-linux-gnu-c++  -O3 -DNDEBUG -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.*include/,${' -e '/^ \/.*include/p' -e '}'"""))

Returns

/home/sophie.hourihane/.conda/envs/bayeswave-cpp/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/include
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/include-fixed
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/../../../../x86_64-conda-linux-gnu/include
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp/bin/../lib/gcc/../../x86_64-conda-linux-gnu/include/c++/12.3.0
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp/bin/../lib/gcc/../../x86_64-conda-linux-gnu/include/c++/12.3.0/x86_64-conda-linux-gnu
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp/bin/../lib/gcc/../../x86_64-conda-linux-gnu/include/c++/12.3.0/backward
 /home/sophie.hourihane/.conda/envs/bayeswave-cpp/bin/../x86_64-conda-linux-gnu/sysroot/usr/include
0

and

import os

# Get the environment variables
env_variables = os.environ

# Print each environment variable
for key, value in env_variables.items():
    print(f'{key}: {value}')

returns

CONDA_SHLVL: 1
JUPYTERHUB_CLIENT_ID: jupyterhub-user-sophie.hourihane
CONDA_EXE: /opt/conda3-2019-09-10/bin/conda
GLOBUS_TCP_SOURCE_RANGE: 40501,41000
JULIA_CONDAPKG_EXE_BACKUP: 
LANG: en_US.UTF-8
JULIA_DEPOT_PATH: /opt/conda3-2019-09-10/share/julia:
LIGO_DATAFIND_SERVER: datafind.ldas.cit:80
CONDA_PREFIX: /opt/conda3-2019-09-10
JAVA_HOME: /usr/lib/jvm/java/jre
CONDA_JL_CONDA_EXE_BACKUP: 
JULIA_CONDAPKG_EXE: /opt/conda3-2019-09-10/bin/conda
JULIA_PROJECT_BACKUP: 
_CE_M: 
GLOBUS_TCP_PORT_RANGE: 40000,40500
JULIA_LOAD_PATH: @:@conda3-2019-09-10:@stdlib
JUPYTERHUB_ACTIVITY_URL: http://127.0.0.1:8081/hub/api/users/sophie.hourihane/activity
JUPYTERHUB_SERVICE_URL: http://127.0.0.1:12391/user/sophie.hourihane/
USER: sophie.hourihane
JUPYTERHUB_BASE_URL: /
JULIA_PROJECT: @conda3-2019-09-10
CONDA_JL_HOME_BACKUP: 
JULIA_CONDAPKG_BACKEND_BACKUP: 
ONLINEHOFT: /online/frames/hoft
SUBPROCESS_BAGGAGE: sentry-trace_id=b230c8b96db14d33999070f8cae52286,sentry-environment=production,sentry-public_key=b17189abfd364491898bfcb37fd3e94d,sentry-transaction=generic%20Tornado%20request,sentry-sample_rate=0.01
PWD: /home/sophie.hourihane
JULIA_CONDAPKG_BACKEND: System
HOME: /home/sophie.hourihane
CONDA_PYTHON_EXE: /opt/conda3-2019-09-10/bin/python
JULIA_SSL_CA_ROOTS_PATH_BACKUP: 
JUPYTERHUB_USER: sophie.hourihane
_CE_CONDA: 
TMPDIR: /local/sophie.hourihane
JUPYTERHUB_OAUTH_SCOPES: ["access:servers!server=sophie.hourihane/", "access:servers!user=sophie.hourihane"]
JUPYTERHUB_SERVICE_PREFIX: /user/sophie.hourihane/
CONDA_PROMPT_MODIFIER: (base) 
JUPYTERHUB_SERVER_NAME: 
CONDA_JL_HOME: /opt/conda3-2019-09-10
JUPYTERHUB_DEBUG: 1
ONLINEDQ: /online/DQ
SHELL: /bin/bash
NDSSERVER: nds.ldas.cit:31200
SUBPROCESS_SENTRY_TRACE: b230c8b96db14d33999070f8cae52286-87d4aba644dfa496-0
JUPYTERHUB_DEFAULT_URL: /lab
JULIA_LOAD_PATH_BACKUP: 
CONDOR_LOCATION: /usr
DEFAULT_SEGMENT_SERVER: http://segments.ldas.cit/
JUPYTERHUB_API_URL: http://127.0.0.1:8081/hub/api
SHLVL: 0
CONDA_JL_CONDA_EXE: /opt/conda3-2019-09-10/bin/conda
JUPYTERHUB_HOST: 
JULIA_SSL_CA_ROOTS_PATH: /opt/conda3-2019-09-10/ssl/cacert.pem
JPY_API_TOKEN: c95492d0d4904994bd992d3bc4839090
JULIA_DEPOT_PATH_BACKUP: 
JUPYTERHUB_OAUTH_CALLBACK_URL: /user/sophie.hourihane/oauth_callback
JUPYTERHUB_API_TOKEN: c95492d0d4904994bd992d3bc4839090
PATH: /home/sophie.hourihane/.conda/envs/bayeswave-cpp/bin:/opt/conda3-2019-09-10/bin:/opt/conda3-2019-09-10/condabin:/opt/conda3/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/dcs/sbin:/opt/dcs/bin:/ldcg/matlab_r2020b/bin:/opt/dcs/bin
GWDATAFIND_SERVER: datafind.ldas.cit:80
CONDA_DEFAULT_ENV: base
tmpdir: /local/sophie.hourihane
S6_SEGMENT_SERVER: http://segments-s6.ldas.cit/
JUPYTERHUB_SINGLEUSER_APP: jupyter_server.serverapp.ServerApp
GIT_PYTHON_REFRESH: quiet
JPY_PARENT_PID: 839996
PYDEVD_USE_FRAME_EVAL: NO
TERM: xterm-color
CLICOLOR: 1
FORCE_COLOR: 1
CLICOLOR_FORCE: 1
PAGER: cat
GIT_PAGER: cat
MPLBACKEND: module://matplotlib_inline.backend_inline
CLING_STANDARD_PCH: /home/sophie.hourihane/.conda/envs/bayeswave-cpp/lib/python3.10/site-packages/cppyy/allDict.cxx.pch.6.28.0
EXTRA_CLING_ARGS:  -O2 -march=native

Also running

import cppyy_backend
print(cppyy_backend.__version__)
print(cppyy.__version__)

returns

6.28.0
3.0.0
sophiehourihane commented 9 months ago

Update: This does not seem to be an issue in jupyter notebooks with cppyy 2.4.2. However I find other issues with using 2.4.2 like returning not-null pointers as null pointers...

wlav commented 9 months ago

My guess for the code "getting stuck" is a bad interaction between signal handlers. Does the code still hang if you remove these lines:

import cppyy.ll
cppyy.ll.set_signals_as_exception(True)

As for SIGILL, there are two probable causes. It could be a spurious memory overwrite, corrupting the code stack. In a small example like this, I think that's unlikely. The other possible cause is a configuration problem. Eg. having compiler features enabled on a platform that doesn't support them.

Can you set the envar EXTRA_CLING_ARGS to the empty string or something basic like -O2, and see whether that makes a difference? (There could well be other libraries that are the cause of the SIGILL, not JITed code, but it's a good place to start.)

sophiehourihane commented 9 months ago

Unfortunately getting rid of import cppyy.ll and setting EXTRA_CLING_ARGS to -O2 did not fix the issue. However what almost fixed the issue was using a new pch.

new_pch = '/home/sophie.hourihane/test.pch'
%env CLING_STANDARD_PCH=$new_pch

Then I was able to generate an rng after doing

# Import the C++ namespaces
cppyy.include('random')
# Create a std::mt19937 object
print('creating rng', flush=True)
number = 42
rng = cppyy.gbl.std.mt19937(number)  # You can seed it with a specific value (e.g., 42)
print('done creating rng', flush=True)

However when I tried to import my other scripts I got this (nonfatal) error message

[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { $.cling-module-140.__inits.0, _GLOBAL__sub_I_cling_module_140, _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructIPKcEEvT_S8_St20forward_iterator_tag, _ZN7VersionL20CODE_VERSION_MESSAGEB5cxx11E, __orc_init_func.cling-module-140, __cxx_global_var_initcling_module_140_, __clang_call_terminate }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { __orc_init_func.cling-module-140 }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { _ZNSt10filesystem7__cxx114pathD1Ev, __cxx_global_var_initcling_module_145_.61, __cxx_global_var_initcling_module_145_.23, __cxx_global_var_initcling_module_145_.31, $.cling-module-145.__inits.0, __cxx_global_var_initcling_module_145_.11, __cxx_global_var_initcling_module_145_.59, _ZN12ProposalTreeD1Ev, __cxx_global_var_initcling_module_145_.56, __cxx_global_var_initcling_module_145_.29, __cxx_global_var_initcling_module_145_.54, __cxx_global_var_initcling_module_145_.2, __cxx_global_var_initcling_module_145_.39, __cxx_global_var_initcling_module_145_.15, __cxx_global_var_initcling_module_145_.25, __cxx_global_var_initcling_module_145_.41, _ZNSt10filesystem7__cxx114pathC2IA2_cS1_EERKT_NS1_6formatE, __cxx_global_var_initcling_module_145_.47, __cxx_global_var_initcling_module_145_.21, __cxx_global_var_initcling_module_145_.9, __cxx_global_var_initcling_module_145_.50, __orc_init_func.cling-module-145, __cxx_global_var_initcling_module_145_.33, __cxx_global_var_initcling_module_145_.52, __cxx_global_var_initcling_module_145_.37, __cxx_global_var_initcling_module_145_.27, __cxx_global_var_initcling_module_145_.1, __cxx_global_var_initcling_module_145_.43, _ZNSt10filesystem7__cxx114pathC2IA35_cS1_EERKT_NS1_6formatE, __cxx_global_var_initcling_module_145_.7, __cxx_global_var_initcling_module_145_.13, __cxx_global_var_initcling_module_145_.55, __cxx_global_var_initcling_module_145_.4, __cxx_global_var_initcling_module_145_.53, __cxx_global_var_initcling_module_145_.35, _ZNSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_12ProposalTreeESt10_Select1stIS9_ESt4lessIS5_ESaIS9_EE7_M_copyILb0ENSF_11_Alloc_nodeEEEPSt13_Rb_tree_nodeIS9_ESK_PSt18_Rb_tree_node_baseRT0_, __cxx_global_var_initcling_module_145_.17, __cxx_global_var_initcling_module_145_.5, __cxx_global_var_initcling_module_145_.49, __cxx_global_var_initcling_module_145_.19, __cxx_global_var_initcling_module_145_.45, _ZNSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE12ProposalTreeSt4lessIS5_ESaISt4pairIKS5_S6_EEEC2ESt16initializer_listISB_ERKS8_RKSC_, __cxx_global_var_initcling_module_145_.58, _ZNSt10filesystem7__cxx114pathC2IA12_cS1_EERKT_NS1_6formatE, _ZNKSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_12ProposalTreeESt10_Select1stIS9_ESt4lessIS5_ESaIS9_EE11_Alloc_nodeclIRKS9_EEPSt13_Rb_tree_nodeIS9_EOT_, __cxx_global_var_initcling_module_145_, __cxx_global_var_initcling_module_145_.51, __cxx_global_var_initcling_module_145_.3, _ZNSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_12ProposalTreeESt10_Select1stIS9_ESt4lessIS5_ESaIS9_EE8_M_eraseEPSt13_Rb_tree_nodeIS9_E }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { _GLOBAL__sub_I_cling_module_153, __orc_init_func.cling-module-153, _ZN11CommandLineL12HELP_MESSAGEB5cxx11E, $.cling-module-153.__inits.0, __cxx_global_var_initcling_module_153_ }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { __orc_init_func.cling-module-145 }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { $.cling-module-196.__inits.0, __orc_init_func.cling-module-196, __cxx_global_var_initcling_module_196_ }) }

And then when I actually tried calling my functions, I got this error

[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { __orc_init_func.cling-module-196 }) }
cling JIT session error: Failed to materialize symbols: { (main, { __clang_call_terminate }) }

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 3
      1 output = f"{FIXED_DIMENSION_GLITCH_PRIOR}/bayeswave_output_0"
----> 3 modelCollectionPost = mcp.ModelCollectionPosterior(output)

File ~/src/bayeswave-cpp/bayeswavecpp_bindings/model_collection_posterior.py:217, in ModelCollectionPosterior.__init__(self, outputDirectory, commandLineArgs, chainIndex, burnIn)
    215     print('reading in run from output directory bayeswave.run file', flush=True)
    216     dataBuilder = bayeswave_post.getDataBuilderFromOutputDir(outputDirectory)
--> 217     modelCollectionBuilder = bayeswave_post.getModelCollectionBuilderFromOutputDir(outputDirectory)
    218 else:
    219     print('reading in run from command line string', flush=True)

File ~/src/bayeswave-cpp/bayeswavecpp_bindings/bayeswave_post.py:72, in getModelCollectionBuilderFromOutputDir(path_to_output_dir)
     70 command_line_input = Cpp.CommandLine.Input(len(split_command_line), split_command_line)
     71 data_builder = getDataBuilderFromOutputDir(path_to_output_dir)
---> 72 model_collection_configuration = Cpp.CommandLine.buildRunConfiguration(command_line_input).chainCollectionConfiguration.modelCollectionConfiguration
     73 return Cpp.ModelCollectionBuilder(model_collection_configuration, data_builder.build())

ValueError: RunConfiguration CommandLine::buildRunConfiguration(const CommandLine::Input& commandLineInput) =>
    ValueError: nullptr result where temporary expected

I don't really understand how to set up the pchs or if I am doing anything fundamentally wrong, do you have any suggestions for what might be going on? Are environment variables somehow stored in the pch? Could that perhaps be causing an issue?

wlav commented 9 months ago

Sorry, yes, just setting EXTRA_CLING_ARGS does not always cause a rebuild of the PCH (setting the envar CLING_REBUILD_PCH forces a rebuild). I thought that (effectively) removing -march=native was one of them, but it's not.

For the conda builds in 2.4.0, the settings of EXTRA_CLING_ARGS is apparently removed completely, which probably explains the difference. (The PCH is not transferable between machines with different CPUs is -march=native is enabled, which is why the intend is to rebuild it locally on first use. This approach breaks down if different machines use the same PCH from a shared filesystem.)

The Failed to materialize symbols error has been reported several times, but I've as-of-yet been unable to reproduce it. Interesting if there were a relation with the PCH.

Could you try a run with CLING_STANDARD_PCH set to none? This would cause full interpretation of the headers each run, so is inefficient in startup time and memory, but I'm just curious whether it make a difference and if so I can take that info to upstream as it would clearly suggest that the failed symbols have something to do with the PCH.

sophiehourihane commented 9 months ago

So when I tried to include my whole library with the pch set to none I just got a ton of import errors

Header file causing problem is /home/sophie.hourihane/src/bayeswave-cpp/src/data/interferometer.hpp
Diagnosing import error failed! Unknown import error!
    ImportError: Failed to load header file "/home/sophie.hourihane/src/bayeswave-cpp/src/data/interferometer.hpp"
In file included from input_line_21:1:
In file included from /home/sophie.hourihane/src/bayeswave-cpp/src/data/interferometer.hpp:34:
In file included from /home/sophie.hourihane/src/bayeswave-cpp/src/series/frequency_series.hpp:31:
/home/sophie.hourihane/src/bayeswave-cpp/src/series/closed_interval.hpp:113:20: error: no template named 'optional' in namespace 'std'
[[nodiscard]] std::optional<ClosedInterval<TValueType>> operator&&(const std::optional<ClosedInterval<TValueType>>& left, const std::optional<ClosedInterval<TValueType>>& right);
              ~~~~~^
/home/sophie.hourihane/src/bayeswave-cpp/src/series/closed_interval.hpp:113:79: error: no template named 'optional' in namespace 'std'
[[nodiscard]] std::optional<ClosedInterval<TValueType>> operator&&(const std::optional<ClosedInterval<TValueType>>& left, const std::optional<ClosedInterval<TValueType>>& right);
                                                                         ~~~~~^
/home/sophie.hourihane/src/bayeswave-cpp/src/series/closed_interval.hpp:113:134: error: no template named 'optional' in namespace 'std'
[[nodiscard]] std::optional<ClosedInterval<TValueType>> operator&&(const std::optional<ClosedInterval<TValueType>>& left, const std::optional<ClosedInterval<TValueType>>& right);
                                                                                                                                ~~~~~^
/home/sophie.hourihane/src/bayeswave-cpp/src/series/closed_interval.hpp:125:20: error: no template named 'optional' in namespace 'std'
[[nodiscard]] std::optional<ClosedInterval<TValueType>> operator&&(const std::optional<ClosedInterval<TValueType>>& left, const ClosedInterval<TValueType>& right);
              ~~~~~^
/home/sophie.hourihane/src/bayeswave-cpp/src/series/closed_interval.hpp:125:79: error: no template named 'optional' in namespace 'std'
[[nodiscard]] std::optional<ClosedInterval<TValueType>> operator&&(const std::optional<ClosedInterval<TValueType>>& left, const ClosedInterval<TValueType>& right);
                                                                         ~~~~~^
/home/sophie.hourihane/src/bayeswave-cpp/src/series/closed_interval.hpp:137:20: error: no template named 'optional' in namespace 'std'
[[nodiscard]] std::optional<ClosedInterval<TValueType>> operator&&(const ClosedInterval<TValueType>& left, const std::optional<ClosedInterval<TValueType>>& right);
              ~~~~~^
/home/sophie.hourihane/src/bayeswave-cpp/src/series/closed_interval.hpp:137:119: error: no template named 'optional' in namespace 'std'
[[nodiscard]] std::optional<ClosedInterval<TValueType>> operator&&(const ClosedInterval<TValueType>& left, const std::optional<ClosedInterval<TValueType>>& right);
                                                                                                                 ~~~~~^
/home/sophie.hourihane/src/bayeswave-cpp/src/series/closed_interval.hpp:148:20: error: no template named 'optional' in namespace 'std'
[[nodiscard]] std::optional<ClosedInterval<TValueType>> operator&&(const ClosedInterval<TValueType>& left, const ClosedInterval<TValueType>& right);
              ~~~~~^
In file included from input_line_21:1:
In file included from /home/sophie.hourihane/src/bayeswave-cpp/src/data/interferometer.hpp:34:
In file included from /home/sophie.hourihane/src/bayeswave-cpp/src/series/frequency_series.hpp:31:
In file included from /home/sophie.hourihane/src/bayeswave-cpp/src/series/closed_interval.hpp:150:
/home/sophie.hourihane/src/bayeswave-cpp/src/series/closed_interval.ipp:70:20: error: no template named 'optional' in namespace 'std'
[[nodiscard]] std::optional<ClosedInterval<TValueType>> operator&&(const std::optional<ClosedInterval<TValueType>>& left, const std::optional<ClosedInterval<TValueType>>& right) {
              ~~~~~^
/home/sophie.hourihane/src/bayeswave-cpp/src/series/closed_interval.ipp:70:79: error: no template named 'optional' in namespace 'std'
[[nodiscard]] std::optional<ClosedInterval<TValueType>> operator&&(const std::optional<ClosedInterval<TValueType>>& left, const std::optional<ClosedInterval<TValueType>>& right) {

~I am not sure if they are actually issues, my import statement is still running (it has been 30 minutes) but it looks like cppyy is not including the library std.~ I was unable to import my libraries because of these issues

After some googling it looks like optional was introduced in C++17, so it may be that the default version of c++ is getting somehow set to lower...

sophiehourihane commented 9 months ago

Update, the other import did crash (kernel died). I tried running with these settings

CONDA_PATH = "/home/sophie.hourihane/.conda/envs/bayeswave-cpp/bin"
#CONDA_PATH = "/home/sophie.hourihane/.conda/envs/bayeswave-cpp-fresh/bin"

JUPYTER_PATH = %env PATH
EXTRA_CLING_ARGS="-O2 -std=c++17 -march=native"
new_pch = 'none' #'/home/sophie.hourihane/test.pch'
%env PATH = $CONDA_PATH:$JUPYTER_PATH
%env EXTRA_CLING_ARGS=$EXTRA_CLING_ARGS
%env CLING_STANDARD_PCH=$new_pch

That is, setting the c++ version to 17 because when i googled the std::optional error that is what came up. Here i am getting these (non-fatal) import error when I run

from bayeswavecpp_bindings import autoload_cppyy
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { _ZN7VersionL20CODE_VERSION_MESSAGEB5cxx11E, __clang_call_terminate, $.cling-module-16.__inits.0, __orc_init_func.cling-module-16, _GLOBAL__sub_I_cling_module_16, _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructIPKcEEvT_S8_St20forward_iterator_tag, __cxx_global_var_initcling_module_16_ }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { __orc_init_func.cling-module-16 }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { __cxx_global_var_initcling_module_21_.2, __cxx_global_var_initcling_module_21_.1, _ZNSt10filesystem7__cxx114pathC2IA2_cS1_EERKT_NS1_6formatE, __cxx_global_var_initcling_module_21_, __cxx_global_var_initcling_module_21_.23, _ZNSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE12ProposalTreeSt4lessIS5_ESaISt4pairIKS5_S6_EEEC2ESt16initializer_listISB_ERKS8_RKSC_, __cxx_global_var_initcling_module_21_.13, __cxx_global_var_initcling_module_21_.61, _ZNSt10filesystem7__cxx114pathC2IA35_cS1_EERKT_NS1_6formatE, __cxx_global_var_initcling_module_21_.49, __cxx_global_var_initcling_module_21_.3, __cxx_global_var_initcling_module_21_.37, __cxx_global_var_initcling_module_21_.51, _ZNSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_12ProposalTreeESt10_Select1stIS9_ESt4lessIS5_ESaIS9_EE8_M_eraseEPSt13_Rb_tree_nodeIS9_E, __cxx_global_var_initcling_module_21_.17, __cxx_global_var_initcling_module_21_.50, __cxx_global_var_initcling_module_21_.4, _ZNSt10filesystem7__cxx114pathC2IA12_cS1_EERKT_NS1_6formatE, __cxx_global_var_initcling_module_21_.35, __cxx_global_var_initcling_module_21_.56, __cxx_global_var_initcling_module_21_.59, __cxx_global_var_initcling_module_21_.58, __cxx_global_var_initcling_module_21_.47, __cxx_global_var_initcling_module_21_.54, _ZNSt10filesystem7__cxx114pathD1Ev, __cxx_global_var_initcling_module_21_.33, __cxx_global_var_initcling_module_21_.39, __cxx_global_var_initcling_module_21_.7, _ZNKSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_12ProposalTreeESt10_Select1stIS9_ESt4lessIS5_ESaIS9_EE11_Alloc_nodeclIRKS9_EEPSt13_Rb_tree_nodeIS9_EOT_, __orc_init_func.cling-module-21, __cxx_global_var_initcling_module_21_.25, __cxx_global_var_initcling_module_21_.55, __cxx_global_var_initcling_module_21_.27, _ZNSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_12ProposalTreeESt10_Select1stIS9_ESt4lessIS5_ESaIS9_EE7_M_copyILb0ENSF_11_Alloc_nodeEEEPSt13_Rb_tree_nodeIS9_ESK_PSt18_Rb_tree_node_baseRT0_, _ZN12ProposalTreeD1Ev, __cxx_global_var_initcling_module_21_.9, __cxx_global_var_initcling_module_21_.21, __cxx_global_var_initcling_module_21_.5, __cxx_global_var_initcling_module_21_.11, __cxx_global_var_initcling_module_21_.45, __cxx_global_var_initcling_module_21_.41, __cxx_global_var_initcling_module_21_.29, __cxx_global_var_initcling_module_21_.43, __cxx_global_var_initcling_module_21_.31, __cxx_global_var_initcling_module_21_.53, __cxx_global_var_initcling_module_21_.52, $.cling-module-21.__inits.0, __cxx_global_var_initcling_module_21_.15, __cxx_global_var_initcling_module_21_.19 }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { _ZN11CommandLineL12HELP_MESSAGEB5cxx11E, _GLOBAL__sub_I_cling_module_29, $.cling-module-29.__inits.0, __orc_init_func.cling-module-29, __cxx_global_var_initcling_module_29_ }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { __orc_init_func.cling-module-21 }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { $.cling-module-72.__inits.0, __orc_init_func.cling-module-72, __cxx_global_var_initcling_module_72_ }) }

So it looks like these errors pop up even when pch is set to none.

I also tried including optional before running and not setting extra cling args,

sophiehourihane commented 9 months ago

My guess is it seems like the c++ standard is not propagating? I tried setting it in -std=c++17 in EXTRA_CLING_ARGS and setting STDCXX but i am not sure if its hiding somewhere else...

wlav commented 9 months ago

For the C++ std flag, can you check the location where the cppyy_backend python module is installed? There should be a directory called etc which has a file dictpch/allCppflags.txt. This contains the flags used for compilation, which are picked up for JITing. (That said, the way it's supposed to work, is that the std flag there is removed if there's a different setting in EXTRA_CLING_ARGS.)

sophiehourihane commented 9 months ago

@wlav This is whats in that file

-std=c++1z
-Wno-implicit-fallthrough
-pipe
-fsigned-char
-pthread

So it looks like c++ version is being set to c++17, but then why is optional not getting recognized without me explicitly setting the c++ version?

When run without a pch, is cppyy_backend still getting used? Also do the compiler flags have to match the compile flags I used when I generated the shared library?

wlav commented 9 months ago

I'm not sure about the first. You can check with:

import cppyy
cppyy.cppxec("std::cerr << __cplusplus << std::endl;")

what Cling thinks the used C++ version is. It may also be that an explicit #include <optional> is needed (if part of the PCH, it is "implicitly" loaded).

And yes, even without PCH, cppyy_backend is still used: that's the library that contains Cling.

For compiler flags, it depends on what the APIs look like. For most function calls, it's not going to make a difference, but if anything is driven by macro's, it may matter (eg. in Eigen, matrix classes have different sizes depending on whether AVX is enabled; or compiler-level features like OpenMP support).

sophiehourihane commented 9 months ago

Oh interesting, we do use Eigen (and openMP, but for my purposes I don't care if threading is enabled).

When I turn off explicitly setting c++ version and I load with pch set to none, I get

import cppyy
cppyy.cppexec("std::cerr << __cplusplus << std::endl;")

201402

So it is using c++ 2014 with 02 optimization? Shouldn't it be using the defaults in cppyy_backend?

When I explicitly set -std=c++17 -O2 and STDCXX=17 that same command returns 201703

(it looks like STDCXX is ignored when that is set but EXTRA_CLING_ARGS is not). Do I need to make sure anything is in my PATH or LD_LIBRARY_PATH?

Also the #include <optional> are in my header files anyways so I don't think I need to explicitly include optional (although I may be wrong).

wlav commented 9 months ago

Having that #include <optional> in the headers is definitely the same as explicitly including it with eg. cppyy.include().

I misremembered the process: the flags are picked up by rootcling when building the PCH, after which it's "baked in" and that's where Cling picks up the C++ version, so yes, looks like defaulting to the minimum (C++14) is the expected behavior if the there is no PCH. Likewise, the override when specifying the standard in the extra flags is in rootcling when rebuilding the PCH.

Yes, if using Eigen, there can be spurious memory overwrites if instances of Eigen classes are passed into compiled functions that were compiled with different flags. If all Eigen code is JITed, then it's internally consistent. (The same is true when using different compiled C++ libraries, that use Eigen, together in a single program.)

(As for OpenMP, if enabled in the extra flags, there will be an OpenMP-specific PCH alongside the standard one.)

None of that is relevant for the original problem with random, though.

I tried to at least reproduce the problem with symbols not materializing using various combinations of EXTRA_CLING_ARGS and CLING_STANDARD_PCH, but I'm not able to (as said above, this is a long-standing problem). I can't reproduce the, or any, SIGILL either.

sophiehourihane commented 9 months ago

Yeah I am really stuck. I un-installed all of cppyy dependencies from my conda environment (easier said than done, cppyy_backend doesn't get uninstalled very nicely with pip. I got rid of all the leftovers by finding )

find $CONDA_PREFIX/lib -name '*ppyy*’
find $CONDA_PREFIX/lib  -name ‘*cling*’
find $CONDA_PREFIX/lib  -name ‘*root*’

find $CONDA_PREFIX/bin -name '*ppyy*’
find $CONDA_PREFIX/bin  -name ‘*cling*’
find $CONDA_PREFIX/bin  -name ‘*root*’

Then I reinstalled with pip in the order cppyy-cling cppyy-backend CPyCppyy cppyy

Running from the command line with the default pch file is totally fine, I can call random and I can create my cpp objects. When i try to set to another pch using CLING_STANDARD_PCH=/home/sophie.hourihane/cppyy_command_line.pch I get the runStaticInitializersOnce errors again, and I am unable to create my cpp objects (I get null pointer errors).

Running from jupyter using the default pch I can't get the rng to load. Running from jupyter with a new pch the rng works fine but I get the same runStaticInitializersOnce errors (and then null pointer segfault).

I guess in either case I cannot seem to make a new pch that works consistently. Do you have any suggestions?

wlav commented 9 months ago

Since there is no conda package for 3.0.0 yet, do I assume correctly you are installing 3.0.0 in a conda environment? What I'm digging for is that the conda compilers have specific full names and the pip package may go for the "g++" in $PATH to find the include headers. Maybe this could be a difference between command line and jupyter.

Can you run with strace? Just wondering whether multiple versions of stdc++.so are being loaded in the Jupyter case.

sophiehourihane commented 9 months ago

Yes I am pip installing cppyy 3.0.0 in a Conda environment. Can you explain what you mean by running with strace ?

wlav commented 9 months ago

In the command line case, run with:

$ strace python ...

and in the Jupyter case, similar:

$ strace jupyter notebook

It will give a massive amount of output, so best to pipe it to a file (e.g. with |& tee strace.log).

Then search that log file for instances of libstdc++.so being opened. It wouldn't surprise me if random has some global state that is duplicated b/c two different libstdc++.so files (one from the system and one from conda) are loaded, with cppyy bringing in one and Jupyter bringing in the other.

sophiehourihane commented 9 months ago

OK I actually do not launch the jupyter notebooks myself, they are hosted by my collaboration. However within my jupyter notebook I did launch the same python script. From the command line it worked perfectly and crashed when I called it from jupyter.

import subprocess

# Specify the path to your Python script
python_script_path = "/home/sophie.hourihane/test_cppyy.py"

# Specify the path to the output file
output_file_path = "output.log"

# Use subprocess to call strace on the Python script and redirect output to a file
with open(output_file_path, "w") as output_file:
    process = subprocess.Popen(["python", python_script_path], stdout=output_file, stderr=subprocess.STDOUT)
    process.wait()  # Wait for the process to finish

# Print a message indicating where the output was saved
print(f"strace output saved to: {output_file_path}")

It does not look like in either output file that libstdc++.so was referenced more than once.

From command line: openat(AT_FDCWD, "/home/sophie.hourihane/.conda/envs/bayeswave-cpp/lib/python3.10/lib-dynload/../../libstdc++.so", O_RDONLY|O_CLOEXEC) = 3

From jupyter openat(AT_FDCWD, "/home/sophie.hourihane/.conda/envs/bayeswave-cpp/lib/python3.10/lib-dynload/../../libstdc++.so", O_RDONLY|O_CLOEXEC) = 3

Later in the jupyter one I do get this error

write(2, "TypeError: Template method resol"..., 2163TypeError: Template method resolution failed:
  none of the 4 overloaded methods succeeded. Full details:
  mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>::mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>(std::mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>&&) =>
    TypeError: could not convert argument 1
  mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>::mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>() =>
    TypeError: takes at most 0 arguments (1 given)
  mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>::mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>(const std::mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>&) =>
    TypeError: could not convert argument 1
  mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>::mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>(std::mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>::result_type __sd) =>
    IllegalInstruction: illegal instruction in C++; program state was reset
  mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>::mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>(std::mersenne_twister_engine<unsigned long,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>::result_type __sd) =>
    IllegalInstruction: illegal instruction in C++; program state was reset) = 2163

which matches the error in the cell

wlav commented 9 months ago

I'm at a loss ...

Let me send a link to the output of the github CI later. I still need to test all wheels (there are still issues left on ARM), but that build contains the latest release of Cling. Maybe we get lucky and whatever is ailing has been fixed.

sophiehourihane commented 9 months ago

I think i might have some idea of what the error is. It actually looks like when I ssh to another computer on the shared filesystem and run from command line I get the rng error as well (although not remote computers give me the error...). I think the issue might be with what you had mentioned before about shared filesystems.

Are there any recommendations for installing cppyy on computers with shared filesystems other than 'don't do that'? Could I possibly make a pch for each computer? (I was having difficulty making any pch work correctly)

wlav commented 9 months ago

Per machine is easy enough if doing:

export CLING_STANDARD_PCH=$HOME/$HOST.pch

or some other host specific name.

However, I'd recommend setting $EXTRA_CLING_ARGS to -O2 (and not setting $CLING_STANDARD_PCH).

sophiehourihane commented 9 months ago

Sorry, can you explain what you mean? If I set $EXTRA_CLING_ARGS and I do not set $CLING_STANDARD_PCH won't it just use the default pch?

wlav commented 9 months ago

Yes and no. The default name is modified to reflect some of the settings from EXTRA_CLING_ARGS:

if 'native' in cling_args:  pchname += 'native.'
if 'openmp' in  cling_args: pchname += 'omp.'
if 'cuda' in cling_args:    pchname += 'cuda.'

but only if CLING_STANDARD_PCH isn't set. (I realize that in the scenario sketched above, native is a misnomer, but it really only refers to that flag, so if only -O2, the name will be different.)

wlav commented 9 months ago

And FWIW, freshest wheels are here: https://github.com/wlav/cppyy-backend/suites/17046022753/artifacts/973818860

sophiehourihane commented 9 months ago

Unfortunately I am still having this issue so just changing the pch isn't really an option :-( I feel like i must be creating the pch incorrectly in the first place somehow.

Running from the command line with the default pch file is totally fine, I can call random and I can create my cpp objects. When i try to set to another pch using CLING_STANDARD_PCH=/home/sophie.hourihane/cppyy_command_line.pch I get the runStaticInitializersOnce errors again, and I am unable to create my cpp objects (I get null pointer errors).

Running from jupyter using the default pch I can't get the rng to load. Running from jupyter with a new pch the rng works fine but I get the same runStaticInitializersOnce errors (and then null pointer segfault).

wlav commented 9 months ago

On the initializers front, I'm hoping to be able to reproduce that with docker thanks to this: https://github.com/wlav/cppyy/issues/175

but I haven't had been able to try it out yet.

sophiehourihane commented 9 months ago

Somewhat of an update but I made a new environment (bayeswave-cpp-pcdev14) on the actual computer that the jupyter server is hosted (pcdev14) (an AMD Opteron 6376 if that means anything to you).

When I use that environment from the command line I get the runStaticInitializersOnce and nullPtr errors but I am able to create an rng. However when I call the environment (bayeswave-cpp-pcdev14) on the original computer (pcdev1) I get no initalizer errors but I do get *** Break *** illegal instruction when creating a rng.

Finally when I call the environment (bayeswave-cpp) I created on the original computer (pcdev1) on the original computer, I get neither error and I can create rngs and my own objects as I please.

The original computer is an AMD EPYC 7543 VM if that means anything to you.

Unfortunately I am not sure what a wheel is so I am not sure how to use it...