root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.62k stars 1.25k forks source link

[TMVA] Python module compatibility problems #15309

Open guitargeek opened 4 months ago

guitargeek commented 4 months ago

Description

There are several known compatibility problems of PyMVA and SOFIE with common Python modules that need to be fixed:

1. Collisions of openblas because of SOFIE vs. NumPy

SOFIE can load an openblas version that is different from the one built in to NumPy, which can cause crashes for example on AlmaLinux 9 (the TMVA_SOFIE_RSofieReader.C tutorial test is deactivated for this reason).

2. TensorFlow 2.16 compatibility for Python 3.12 support

PyMVA doesn't work with TensorFlow>=2.16 because of the change from Keras 2 to Keras 3 in TensorFlow (that the motivation for this version check in requirements.txt). TensorFlow 2.16 support is important, because it's the first version that supports Python 3.12.

3. Collisions of openblas because of tmva-cpu vs NumPy

The TMVA CPU backend (activated with tmva-cpu has the same problem as stated in 1., because it also uses openblas. That's the reason why tmva-cpu is deactivated on alma9.

4. Collisions of std::regexp because of cppyy vs {torch, xgboost}

Collisions of std::regexp that comes with cppyy with the std::regexp in XGBoost or PyTorch. This is more of a cppyy issue (or an issue of the other modules), but mostly affects PyMVA tutorials, because that's where ROOT is imported together with machine learning libraries. When this is fixed, 872886b and 1bf3d5a can be reverted.

Related issues:

ellert commented 3 months ago

The default blas version in RHEL/Alma/Rocky 9 is flexiblas. All system libraries linked to blas uses (or at least should use) it. This includes the numpy version provided by the system:

rpm -q python3-numpy

python3-numpy-1.20.1-5.el9.x86_64

rpm -q --requires python3-numpy | grep blas

libflexiblas.so.3()(64bit)

So if you are going to interact with system libraries that also use blas, you should use flexiblas.

Flexiblas is a wrapper library that make it possible the change the blas implementation used without recompiling by changing the configuration, the default configuration uses openblas.

As long as you link root against flexiblas you should be fine. The root package in EPEL 9 does this:

rpm -q root-tmva

root-tmva-6.30.06-1.el9.x86_64

rpm -q --requires root-tmva | grep blas

libflexiblas.so.3()(64bit)

CMake's FindBLAS module has higher priority for flexiblas than openblas, so you don't have to use any special flags to use it, as long as flexiblas-devel is installed.

guitargeek commented 3 months ago

Thanks, that's a very valuable insight! Indeed, on alma9 it picked up flexiblas at the configuration: https://github.com/root-project/root/actions/runs/8785067526/job/24104668181

  -- Found BLAS: /usr/lib64/libflexiblas.so  

The probably the symbol clash came from mixing flexiblas from the system with openblas from the numpy package that's installed with pip