pybamm-team / PyBaMM

Fast and flexible physics-based battery models in Python
https://www.pybamm.org/
BSD 3-Clause "New" or "Revised" License
1.06k stars 526 forks source link

Tracking issue: migrate to `scikit-build-core` #3564

Open agriyakhetarpal opened 9 months ago

agriyakhetarpal commented 9 months ago

Now that we have a declarative format for build metadata provided by pyproject.toml and build isolation enabling a more stable installation procedure via PEP 517/518, it eases the migration to either the scikit-build-core or the meson-python build-backend(s). Both of them include support for editable installations and are slowly becoming more mature for downstream users, considering that NumPy^1 and SciPy^2 have successfully transitioned this year.

The scikit-build-core build backend would still use CMake as a build system as we do currently, but meson-python requires ninja as a build system, which would render the IDAKLU target compilation procedure radically different from the current one.

Some advantages of these new build-backends:

  1. Better support for compilers plus ease of use^3
  2. Reliable build caching for faster installation
  3. Both of them are faster at compilation and more modern (multithreaded builds, cross-compilation, do not use distutils or setuptools at all)
  4. Packagers for libraries in the Scientific Python ecosystem are slowly migrating and we should too
  5. Might enable faster IDAKLU compilation with other compilers like MinGW? i.e., on Windows when building from source (which we do only when building the wheel)

Task list

These tasks will be useful to look at, but not all of them might end up being needed or valid – subject to further discussion:

kratman commented 9 months ago

Personally I would prefer to work with CMake over Ninja, but that could just be my bias from spending most of my career as a C++ dev

agriyakhetarpal commented 9 months ago

Yes, CMake is the easier of the two overall and requires less effort since the compilation will still be controlled via the CMakeLists.txt file. I don't know much about Meson enough to comment on it.

Nevertheless, this can be a potential GSoC project for someone who wants to tackle Python packaging since the changes would revolve a lot around that area, if not entirely around it.

cringeyburger commented 6 months ago

Hello! I was going through the GSOC 2024 Idealist and encountered this project. I am interested in learning and working on the Python backend, and this project will significantly help develop my skills. In the same route, I can help contribute to PyBaMM!

As a starter, I would like to discuss whether we should decide on meson-python or scikit-build-core.

I would prefer meson-python over scikit-build-core, mainly because of its popularity after SciPy (version 1.9.0) and NumPy (version 1.26) decided to migrate to it. The documentation, posts, blogs and projects that sparked because of this would be helpful.

Also, meson-python is the most popular method after setuptools to build C extensions.

I would also like to discuss some technical aspects of the project. Thank you!

agriyakhetarpal commented 6 months ago

Thanks for sharing, @cringeyburger. I would agree that meson-python is being used highly, but NumPy and SciPy use a vendored version of it because they want features that do not yet exist in the main source code. The reason, I believe, is that the Meson developers tend to prevent pushing anything hacky into the codebase and wish to do things in the conventional way and how robust solutions go. Both projects are very large and use almost every feature available in Meson, which we may not require for our use case – that should be great.

The Ninja build system should be able to handle the IDAKLU solver's compilation, but I am unsure whether it can do so for the prerequisites (SuiteSparse and SUNDIALS) – they might require a pure CMake build or perhaps calling CMake from inside meson.build. I am aware that pybind11 works with meson-python and can be used as normal, so it should just be a matter of pointing the build procedure to the location of the compiled files for the prerequisites. I would perhaps explore methods for caching so that editable installations can be sped up – Meson's in-place builds on the import pybamm statement can then rely on just compiling specific sections of the code – my experiments with https://github.com/pybop-team/PyBOP/pull/176 revealed that around 30 seconds are consumed by the IDAKLU compilation, and ~1 min 30 seconds for SuiteSparse and SUNDIALS on Unix-based systems.

However, Windows builds are significantly long and take ~20 minutes overall, without caching which would be a problem on editable installations. It looks like https://github.com/microsoft/vcpkg/pull/36885 will get merged soon which could provide some benefits (I'm unsure whether it is the SUNDIALS compilation or the SuiteSparse's component's compilation that takes longer on Windows). This might require bumping SUNDIALS to v6.7.0, we use v6.5.0 – it doesn't look like there are many changes to the IDAS component that we use (here are the release notes) and subsequently perhaps SuiteSparse from v6.0.3 to a later version (which would require changes to the C++ solver code).

We should be able to forego setup.py entirely with Meson – it has been done for NumPy: https://github.com/numpy/numpy/pull/24519 and should be out with NumPy v2 (any time now). With scikit-build-core, we might not be able to do so completely, but it isn't an issue, we can remove the dependence on wheel.bdist_wheel (it has been recommended against using it for quite some time now) and rewrite the setup.py code to be cleaner. Like it has been mentioned above, using this choice of backend will be much easier to integrate – everything will be governed by the CMake flags and CMakeLists.txt, which we already have.

Saransh-cpp commented 6 months ago

+1 for scikit-build-core

agriyakhetarpal commented 6 months ago

Bumping the priority to medium since this will now be worked on this year

agriyakhetarpal commented 6 months ago

I did a quick test and I notice that on macOS, vcpkg install sundials takes 14 seconds, while vcpkg install suitesparse took 5.5 minutes, so perhaps using sundials[klu] when it is ready upstream will also resolve our Windows woes? Building on Windows is slow in general and our Windows builds painfully take 20 or so minutes and this is after enabling multithreaded builds (without which it takes even longer). But I see that SuiteSparse uses MinGW to compile in Windows CI where the time is around 2 minutes; we compile from source using GCC on Unix-based platforms where this hardly takes 30 seconds – I assume adding Ccache support will be highly beneficial for Windows

cringeyburger commented 6 months ago

On windows x64 architecture laptop, vcpkg install sundials took 19 seconds and vcpks install suitesparse took a solid 16 minutes (including additional packages)

On WSL running Ubuntu x64-linux, vcpkg install sundials took 26 seconds and vcpkg install suitesparse took 5 minutes (including additional packages)

Does vcpkg have parallelism and caching features? I think we should try to explore Ninja, because of the following reasons:

As you have said, if that PR with vcpkg is merged, then having suitesparse[klu] can help speed up the process.

I am interested in learning more about Ccache, but from what I know, caching will definitely help improve the successive build processes.

For large packages like ours, the incremental build feature of CMake utilised by scikit-build-core would work well with Ccache.

Assuming that majority of our team is familiar with CMake, I too think that scikit-build-core is the better option.

NOTE: My network speed (measured using Google's speed test) is ~100mbps

agriyakhetarpal commented 6 months ago

You mention that vcpkg install suitesparse took sixteen minutes with additional components – for Windows I suppose it comes with METIS being compiled. From SuiteSparse, we require just the KLU component for SUNDIALS to use for sparse matrix support, and this component relies on three other components: AMD, COLAMD, and BTF. Is there a way to disable certain components and build just these required SuiteSparse components, by passing extra flags to the install command? Getting down to 1.8 minutes across platforms would be a significant boost!

We should still use the GitHub releases for SuiteSparse on Linux and macOS though – they seem to be faster than vcpkg, i.e., through the nox -s pybamm-requires session. We could remove that and let everything proceed through a pip install <...> command – the user should just have a compiler toolchain present and the build procedure should do and abstract away the rest of the work (I have some experiments with this in a branch locally where I used setuptools to do so, but I never pushed them in favour of renovating the build script through this issue and subsequent GSoC project).

I am not sure about parallelism in vcpkg because that depends on the choice of CMake generator being used, I went through the issue thread you shared but I don't think we can completely switch to Ninja. But there are indeed binary caching features present through ABI checksums (computed with a combination of triplet files, files present in the port being used, compiler configurations, etc.) We do cache them in CI, but our release process is once in four months which means it is never used. I have seen that with a cache hit it reduces the build time on Windows to about 9 minutes.

I do not know much about Ccache either. It would be great to set that up, yes – we would want incremental builds through scikit-build-core in a way that SUNDIALS and SuiteSparse (KLU) are not rebuilt. This could tie in nicely with removing pybamm-requires.

P.S. I see that you have a Windows system configured already – could you test how long it takes to do vcpkg install casadi through PyBaMM's own registry that we have set up?

cringeyburger commented 6 months ago

Sorry for the late reply!

I tried doing a lot of things for component-based installation of vcpkg install suitesparse, but to no avail. An issue was opened regarding this but the work is still in -progress- (it has almost been two years, but unfortunately no significant work has been done.)

Here is the link to the issue.

As for vcpkg install casadi on Windows-x64, it took me 2.8min using PyBaMM's CasADI registry.

Here is the procedure I followed:

  1. Clone the registry to a "test" folder, now I have a casadi-vcpkg-registry folder inside my test folder.
  2. Change the cwd to the test folder.
  3. Now, I ran vcpkg install casadi --overlay-ports=casadi-vcpkg-registry/ports/casadi

After this vcpkg starting working on the task, but when performing post-build validation I encountered some warnings from vcpkg: 1.warning: Include files should not be duplicated into the /debug/include directory.

  1. The /lib/cmake folder should be merged with /debug/lib/cmake and moved to /share/casadi/cmake.
  2. The following cmake files were found outside /share/casadi. Please place cmake files in /share/casadi.
  3. There should be no absolute paths, such as the following, in an installed package

(I will reproduce the names of the files if you want)

And the final statement: error: Found 5 post-build check problem(s). To submit these ports to curated catalogs, please first correct the portfile:

Despite the warnings, vcpkg successfully installs casadi

Please let me know if there is any issue from my side!

agriyakhetarpal commented 6 months ago

2.8 minutes should be reasonable, thanks – this confirms that the SuiteSparse installation is the slowest of all. 2.8 minutes is of course significantly slower than installing CasADi from its Python wheel, which is a matter of seconds – we were using the wheel earlier but had to resort to using the registry afterwards due to upstream changes. We have an issue open (#3603) for restoring this behaviour, which no one has been able to get to yet.

It looks like the SuiteSparse issue will be dormant for a long time – I noticed that I had seen it before because I was already subscribed to it :)

The CasADi installation from vcpkg looks normal to me especially if it succeeded, since I have noted these logs in our Windows wheels before – but it would be good to look how to clear out the warnings (they do not look to be relevant for the CasADi installation and are probably more so for the registry and its portfile itself).

cringeyburger commented 6 months ago

I tried fixing the warnings by changing the portfile. I followed the instructions given through warnings when I built it.

  1. Added vcpkg-cmake-config as a dependency and added file(REMOVE_RECURSE "${CURRENT_PACKAGES_DIR}/debug/include"). This helped in reducing one warning.
  2. I tried to fix warnings 2 and 3 using vcpkg_cmake_config_fixup(), but it's erroring out because it's searching for some directory that doesn't exist. I tried with and without this command, and the directory CONFIG_PATH share/casdi/cmake doesn't exist.
  3. Regarding number 4, a quick search through the net tells me we may have to fix the absolute paths in the header files themselves.
agriyakhetarpal commented 6 months ago

Oh, by "good to look how to clear the warnings" I didn't mean doing so just now – I do not think it is the biggest of deals currently and can definitely be picked up at a later time (maybe CasADi can be added as a vcpkg package upstream too where someone interested can take care of this). Nevertheless, thanks for attempting this!

The gist of the discussion that entailed here should be that despite and regardless of the build backend we end up choosing, reducing the Windows installation time would be great. We don't compile the IDAKLU solver on PR workflows because of that – that is an area where reducing CI time is important. But then we don't compile it when running the Windows scheduled tests either (#3664) where we don't need to bother how long the test runs. Being able to speed things up would benefit us in other ways too, say, identifying issues such as #3783 early on.

prady0t commented 6 months ago

I'm interested in this project too. I'm in the process of making a proposal