Open LecrisUT opened 4 days ago
I think, probably, we can avoid calling LAPACKE routines in the C code in phono3py. There are two types of calling those routines:
zheev
) of dynamical matrices over many q-points: https://github.com/phonopy/phono3py/blob/59dc78e98238eba932612b1f05406f07e39c53b6/c/phonon.c#L228For 1, calling the LAPACK routines via scipy is fine if the BLAS is multithreaded. In old days, installation of scipy was sometimes difficult, so I tried to avoid relying on it. But now I feel OK to depend on scipy, so except for experimental usage (https://phonopy.github.io/phono3py/direct-solution.html#solver-choice-for-diagonalization), use of LAPACKE may be avoided at least for the release version.
For 2, currently I have no idea how to call selectively the multithreaded and single thread BLASs in one installation of phono3py. So I usually choose multithreaded BLAS, and the zheev
in the above case is called sequentially over q-points. This diagonalization can be replaced by using numpy eigh in python. I am not sure the performance calling numpy over many q-points (i.e., overhead of python and wrapping LAPACK in numpy), but if this part is realized as not a bottle neck of the phono3py calculation, we can avoid using LAPACKE.
Overall, it is a good challenge to avoid using LAPACKE in a way as written above. Simply I had no reason to try it.
@LecrisUT, I have made an option to remove the BLAS and LAPACKE calls in the C code (#301), and the libraries are unnecessary to be installed and linked. While I haven't conducted an extensive performance evaluation, it seems not bad. I believe this version is suitable for use in both conda and pip wheel packages. Although CMakeLists.txt
became even more dirty, but with the following way, phono3py is compiled with this option:
% BUILD_WITHOUT_LAPACKE=ON pip install -e . -vvv
Ok, I'll look into the changes and rebase. BTW scipy
is already an indirect dependency because phono3py -> phonopy -> scipy
. So for now we should continue having both in parallel? For cibuildwheel
is the scipy
/numpy
alternative complete, and should it include the lapack approach or just the numpy
?
Yes, scipy
is absolutely necessary for phono3py if we don't use LAPACKE.
For cibuildwheel is the scipy/numpy alternative complete, and should it include the lapack approach or just the numpy?
It is difficult to understand this sentence... The lapack approach is unnecessary.
For cibuildwheel is the scipy/numpy alternative complete, and should it include the lapack approach or just the numpy?
It is difficult to understand this sentence... The lapack approach is unnecessary.
I am just checking if all the lapacke
calls have a redirect to numpy/scipy in that recent PR so that the user would not have a run failure when it's not built with lapack support.
the user would not have a run failure when it's not built with lapack support.
I think so. The following is the test, https://github.com/phonopy/phono3py/blob/develop/.github/workflows/phono3py-pytest-conda-nolapacke.yml where lapacke is not used:
loading initial cache file /tmp/tmpmyqadjbh/build/CMakeInit.txt
-- Build nanobind module of phono3py
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/runner/miniconda3/envs/test/bin/x86_64-conda-linux-gnu-cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/runner/miniconda3/envs/test/bin/x86_64-conda-linux-gnu-c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python: /home/runner/miniconda3/envs/test/bin/python3.12 (found suitable version "3.12.7", minimum required is "3.8") found components: Interpreter Development.Module Development.SABIModule
-- Build type: Release
-- CMAKE_SYSTEM_PREFIX_PATH: /home/runner/miniconda3/envs/test/bin/../x86_64-conda-linux-gnu/sysroot/usr;/usr/local;/usr;/;/home/runner/miniconda3/envs/test;/tmp/tmpmyqadjbh/wheel/platlib;/usr/X11R6;/usr/pkg;/opt;/home/runner/miniconda3/envs/test
-- /home/runner/miniconda3/envs/test
-- Find OpenMP library
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- OpenMP libs: /home/runner/miniconda3/envs/test/lib/libgomp.so;/home/runner/miniconda3/envs/test/x86_64-conda-linux-gnu/sysroot/usr/lib/libpthread.so
-- OpenMP flags: -fopenmp
-- Configuring done (1.0s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/tmpmyqadjbh/build
The test with lapacke (https://github.com/phonopy/phono3py/blob/develop/.github/workflows/phono3py-pytest-conda.yml)
loading initial cache file /tmp/tmpq8dtrmuy/build/CMakeInit.txt
-- Build nanobind module of phono3py
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/runner/miniconda3/envs/test/bin/x86_64-conda-linux-gnu-cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/runner/miniconda3/envs/test/bin/x86_64-conda-linux-gnu-c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python: /home/runner/miniconda3/envs/test/bin/python3.12 (found suitable version "3.12.7", minimum required is "3.8") found components: Interpreter Development.Module Development.SABIModule
-- Build type: Release
-- CMAKE_SYSTEM_PREFIX_PATH: /home/runner/miniconda3/envs/test/bin/../x86_64-conda-linux-gnu/sysroot/usr;/usr/local;/usr;/;/home/runner/miniconda3/envs/test;/tmp/tmpq8dtrmuy/wheel/platlib;/usr/X11R6;/usr/pkg;/opt;/home/runner/miniconda3/envs/test
-- /home/runner/miniconda3/envs/test
-- Find OpenMP library
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- OpenMP libs: /home/runner/miniconda3/envs/test/lib/libgomp.so;/home/runner/miniconda3/envs/test/x86_64-conda-linux-gnu/sysroot/usr/lib/libpthread.so
-- OpenMP flags: -fopenmp
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /home/runner/miniconda3/envs/test/lib/libopenblas.so
-- BLAS libs: /home/runner/miniconda3/envs/test/lib/libopenblas.so
-- BLAS flags:
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /home/runner/miniconda3/envs/test/lib/libopenblas.so;-lpthread;-lm;-ldl
-- LAPACK libs: /home/runner/miniconda3/envs/test/lib/libopenblas.so;-lpthread;-lm;-ldl
-- LAPACK flags:
-- OpenBLAS detected.
-- Set C-macro MULTITHREADED_BLAS to avoid nested OpenMP calls.
-- Configuring done (1.5s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/tmpq8dtrmuy/build
I think it is good to consider the new C++26 proposal for
<linalg>
, specifically section 9.2It seems unlikely that LAPACK interface would be ported to standard library and when BLAS is ported to C++ std, I don't believe the BLAS provider projects would have much benefit. Even currently there are issues:
lapacke.h
is not packaged in all providerspkg-config
supportFindLAPACK.cmake
does not actually add the relevantinclude
to work properlyThe paper recommends other C++ native libraries like Armadillo, Eigen3, etc. It might be useful to look into how the support for these would look like. But in the meantime let's discuss a plan of action
Short-term
Continue to use
lapacke.h
and test for the presence of the header, and fail the build if it's not present. The user would have to define themselves overrides likeBLA_VENDOR
to select the vendor that is compatible like OpenBLAS. There are various bugs that need to be addressed upstream:pkg-config
filesFindLAPACK.cmake
should have an interface to override the include directoryWith regards to the packaged wheels, these will contain bundled BLAS and LAPACK implementations compatible with
cibuildwheel
. How well this works, we will have to find out and work withcibuildwheel
folks. But hopefully the user can rely on thesdist
builds to work on their arbitrary environments.Future
For the future support we should consider some other options:
lapacke
wrappersFetchContent
in case the user does not have the packageIf it's possible to get some performance benchmarking or profiling that would be very helpful for guiding which approach is most suitable.