phonopy / phono3py

A simulation package of phonon-phonon interaction related properties
http://phonopy.github.io/phono3py/
BSD 3-Clause "New" or "Revised" License
126 stars 54 forks source link

Alternatives to `lapacke.h` #300

Open LecrisUT opened 4 days ago

LecrisUT commented 4 days ago

I think it is good to consider the new C++26 proposal for <linalg>, specifically section 9.2

Nevertheless, we have excluded LAPACK-like functionality from this proposal, for the following reasons:

  1. LAPACK is a Fortran library, unlike the BLAS, which is a multilanguage standard.
  2. We intend to support more general element types, beyond the four that LAPACK supports. It’s much more straightforward to make a C++ BLAS work for general element types, than to make LAPACK algorithms work generically.

First, unlike the BLAS, LAPACK is a Fortran library, not a standard. LAPACK was developed concurrently with the “level 3” BLAS functions, and the two projects share contributors. Nevertheless, only the BLAS and not LAPACK got standardized

...

For these reasons, we have left LAPACK-like functionality for future work. It would be natural for a future LAPACK-like C++ library to build on our proposal.

It seems unlikely that LAPACK interface would be ported to standard library and when BLAS is ported to C++ std, I don't believe the BLAS provider projects would have much benefit. Even currently there are issues:

The paper recommends other C++ native libraries like Armadillo, Eigen3, etc. It might be useful to look into how the support for these would look like. But in the meantime let's discuss a plan of action

Short-term

Continue to use lapacke.h and test for the presence of the header, and fail the build if it's not present. The user would have to define themselves overrides like BLA_VENDOR to select the vendor that is compatible like OpenBLAS. There are various bugs that need to be addressed upstream:

With regards to the packaged wheels, these will contain bundled BLAS and LAPACK implementations compatible with cibuildwheel. How well this works, we will have to find out and work with cibuildwheel folks. But hopefully the user can rely on the sdist builds to work on their arbitrary environments.

Future

For the future support we should consider some other options:

If it's possible to get some performance benchmarking or profiling that would be very helpful for guiding which approach is most suitable.

atztogo commented 3 days ago

I think, probably, we can avoid calling LAPACKE routines in the C code in phono3py. There are two types of calling those routines:

  1. Calling a function that blocks calculation (diagonalization, pinv).
  2. Many small callings of a function that are expected running in parallel. In this case, single thread BLAS should be used because OpenMP is used to call the function concurrently in my implementation. If I remember correctly, this way of using is only the following part of diagonalization (zheev) of dynamical matrices over many q-points: https://github.com/phonopy/phono3py/blob/59dc78e98238eba932612b1f05406f07e39c53b6/c/phonon.c#L228

For 1, calling the LAPACK routines via scipy is fine if the BLAS is multithreaded. In old days, installation of scipy was sometimes difficult, so I tried to avoid relying on it. But now I feel OK to depend on scipy, so except for experimental usage (https://phonopy.github.io/phono3py/direct-solution.html#solver-choice-for-diagonalization), use of LAPACKE may be avoided at least for the release version.

For 2, currently I have no idea how to call selectively the multithreaded and single thread BLASs in one installation of phono3py. So I usually choose multithreaded BLAS, and the zheev in the above case is called sequentially over q-points. This diagonalization can be replaced by using numpy eigh in python. I am not sure the performance calling numpy over many q-points (i.e., overhead of python and wrapping LAPACK in numpy), but if this part is realized as not a bottle neck of the phono3py calculation, we can avoid using LAPACKE.

Overall, it is a good challenge to avoid using LAPACKE in a way as written above. Simply I had no reason to try it.

atztogo commented 3 days ago

@LecrisUT, I have made an option to remove the BLAS and LAPACKE calls in the C code (#301), and the libraries are unnecessary to be installed and linked. While I haven't conducted an extensive performance evaluation, it seems not bad. I believe this version is suitable for use in both conda and pip wheel packages. Although CMakeLists.txt became even more dirty, but with the following way, phono3py is compiled with this option:

% BUILD_WITHOUT_LAPACKE=ON pip install -e . -vvv 
LecrisUT commented 3 days ago

Ok, I'll look into the changes and rebase. BTW scipy is already an indirect dependency because phono3py -> phonopy -> scipy. So for now we should continue having both in parallel? For cibuildwheel is the scipy/numpy alternative complete, and should it include the lapack approach or just the numpy?

atztogo commented 3 days ago

Yes, scipy is absolutely necessary for phono3py if we don't use LAPACKE.

For cibuildwheel is the scipy/numpy alternative complete, and should it include the lapack approach or just the numpy?

It is difficult to understand this sentence... The lapack approach is unnecessary.

LecrisUT commented 3 days ago

For cibuildwheel is the scipy/numpy alternative complete, and should it include the lapack approach or just the numpy?

It is difficult to understand this sentence... The lapack approach is unnecessary.

I am just checking if all the lapacke calls have a redirect to numpy/scipy in that recent PR so that the user would not have a run failure when it's not built with lapack support.

atztogo commented 2 days ago

the user would not have a run failure when it's not built with lapack support.

I think so. The following is the test, https://github.com/phonopy/phono3py/blob/develop/.github/workflows/phono3py-pytest-conda-nolapacke.yml where lapacke is not used:

  loading initial cache file /tmp/tmpmyqadjbh/build/CMakeInit.txt
  -- Build nanobind module of phono3py
  -- The C compiler identification is GNU 13.3.0
  -- The CXX compiler identification is GNU 13.3.0
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /home/runner/miniconda3/envs/test/bin/x86_64-conda-linux-gnu-cc - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /home/runner/miniconda3/envs/test/bin/x86_64-conda-linux-gnu-c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Found Python: /home/runner/miniconda3/envs/test/bin/python3.12 (found suitable version "3.12.7", minimum required is "3.8") found components: Interpreter Development.Module Development.SABIModule
  -- Build type: Release
  -- CMAKE_SYSTEM_PREFIX_PATH: /home/runner/miniconda3/envs/test/bin/../x86_64-conda-linux-gnu/sysroot/usr;/usr/local;/usr;/;/home/runner/miniconda3/envs/test;/tmp/tmpmyqadjbh/wheel/platlib;/usr/X11R6;/usr/pkg;/opt;/home/runner/miniconda3/envs/test
  -- /home/runner/miniconda3/envs/test
  -- Find OpenMP library
  -- Found OpenMP_C: -fopenmp (found version "4.5")
  -- Found OpenMP_CXX: -fopenmp (found version "4.5")
  -- Found OpenMP: TRUE (found version "4.5")
  -- OpenMP libs: /home/runner/miniconda3/envs/test/lib/libgomp.so;/home/runner/miniconda3/envs/test/x86_64-conda-linux-gnu/sysroot/usr/lib/libpthread.so
  -- OpenMP flags: -fopenmp
  -- Configuring done (1.0s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/tmpmyqadjbh/build

The test with lapacke (https://github.com/phonopy/phono3py/blob/develop/.github/workflows/phono3py-pytest-conda.yml)

  loading initial cache file /tmp/tmpq8dtrmuy/build/CMakeInit.txt
  -- Build nanobind module of phono3py
  -- The C compiler identification is GNU 13.3.0
  -- The CXX compiler identification is GNU 13.3.0
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /home/runner/miniconda3/envs/test/bin/x86_64-conda-linux-gnu-cc - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /home/runner/miniconda3/envs/test/bin/x86_64-conda-linux-gnu-c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Found Python: /home/runner/miniconda3/envs/test/bin/python3.12 (found suitable version "3.12.7", minimum required is "3.8") found components: Interpreter Development.Module Development.SABIModule
  -- Build type: Release
  -- CMAKE_SYSTEM_PREFIX_PATH: /home/runner/miniconda3/envs/test/bin/../x86_64-conda-linux-gnu/sysroot/usr;/usr/local;/usr;/;/home/runner/miniconda3/envs/test;/tmp/tmpq8dtrmuy/wheel/platlib;/usr/X11R6;/usr/pkg;/opt;/home/runner/miniconda3/envs/test
  -- /home/runner/miniconda3/envs/test
  -- Find OpenMP library
  -- Found OpenMP_C: -fopenmp (found version "4.5")
  -- Found OpenMP_CXX: -fopenmp (found version "4.5")
  -- Found OpenMP: TRUE (found version "4.5")
  -- OpenMP libs: /home/runner/miniconda3/envs/test/lib/libgomp.so;/home/runner/miniconda3/envs/test/x86_64-conda-linux-gnu/sysroot/usr/lib/libpthread.so
  -- OpenMP flags: -fopenmp
  -- Looking for sgemm_
  -- Looking for sgemm_ - not found
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
  -- Looking for pthread_create in pthreads
  -- Looking for pthread_create in pthreads - not found
  -- Looking for pthread_create in pthread
  -- Looking for pthread_create in pthread - found
  -- Found Threads: TRUE
  -- Looking for sgemm_
  -- Looking for sgemm_ - found
  -- Found BLAS: /home/runner/miniconda3/envs/test/lib/libopenblas.so
  -- BLAS libs: /home/runner/miniconda3/envs/test/lib/libopenblas.so
  -- BLAS flags:
  -- Looking for cheev_
  -- Looking for cheev_ - found
  -- Found LAPACK: /home/runner/miniconda3/envs/test/lib/libopenblas.so;-lpthread;-lm;-ldl
  -- LAPACK libs: /home/runner/miniconda3/envs/test/lib/libopenblas.so;-lpthread;-lm;-ldl
  -- LAPACK flags:
  -- OpenBLAS detected.
  -- Set C-macro MULTITHREADED_BLAS to avoid nested OpenMP calls.
  -- Configuring done (1.5s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/tmpq8dtrmuy/build