scipy / scipy

SciPy library main repository
https://scipy.org
BSD 3-Clause "New" or "Revised" License
13.07k stars 5.18k forks source link

BLD: look into whether windows wheels need to be manually stripped #21015

Open andyfaff opened 4 months ago

andyfaff commented 4 months ago

Describe your issue.

see https://github.com/scipy/scipy/pull/21000. Currently windows wheels are stripped, which @rgommers doesn't think is necessary. However, the wheel size jumps by 15 Mb when the wheel isn't stripped.

rgommers commented 4 months ago

It looks like all extension modules get smaller with stripping:

``` ls -l strip_no/scipy/special/*.pyd -rw-rw-rw-@ 1 rgommers staff 167649 Jun 21 03:55 strip_no/scipy/special/_comb.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 265158 Jun 21 03:55 strip_no/scipy/special/_ellip_harm_2.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 800216 Jun 21 03:55 strip_no/scipy/special/_gufuncs.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 381713 Jun 21 03:55 strip_no/scipy/special/_specfun.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 1660064 Jun 21 03:55 strip_no/scipy/special/_special_ufuncs.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 515496 Jun 21 03:55 strip_no/scipy/special/_test_internal.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 2060858 Jun 21 03:55 strip_no/scipy/special/_ufuncs.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 4749672 Jun 21 03:55 strip_no/scipy/special/_ufuncs_cxx.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 3577036 Jun 21 03:55 strip_no/scipy/special/cython_special.cp312-win_amd64.pyd ls -l strip_yes/scipy/special/*.pyd -rw-rw-rw-@ 1 rgommers staff 50176 Jun 21 08:45 strip_yes/scipy/special/_comb.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 113152 Jun 21 08:45 strip_yes/scipy/special/_ellip_harm_2.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 334848 Jun 21 08:45 strip_yes/scipy/special/_gufuncs.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 202240 Jun 21 08:45 strip_yes/scipy/special/_specfun.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 1050624 Jun 21 08:45 strip_yes/scipy/special/_special_ufuncs.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 230400 Jun 21 08:45 strip_yes/scipy/special/_test_internal.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 1544192 Jun 21 08:45 strip_yes/scipy/special/_ufuncs.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 2309120 Jun 21 08:45 strip_yes/scipy/special/_ufuncs_cxx.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 3022336 Jun 21 08:45 strip_yes/scipy/special/cython_special.cp312-win_amd64.pyd ls -l strip_no/scipy/linalg/*.pyd -rw-rw-rw-@ 1 rgommers staff 858698 Jun 21 03:55 strip_no/scipy/linalg/_cythonized_array_utils.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 516579 Jun 21 03:55 strip_no/scipy/linalg/_decomp_lu_cython.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 481886 Jun 21 03:55 strip_no/scipy/linalg/_decomp_update.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 1328003 Jun 21 03:55 strip_no/scipy/linalg/_fblas.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 2705373 Jun 21 03:55 strip_no/scipy/linalg/_flapack.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 1768690 Jun 21 03:55 strip_no/scipy/linalg/_interpolative.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 785838 Jun 21 03:55 strip_no/scipy/linalg/_matfuncs_expm.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 523283 Jun 21 03:55 strip_no/scipy/linalg/_matfuncs_sqrtm_triu.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 542683 Jun 21 03:55 strip_no/scipy/linalg/_solve_toeplitz.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 586477 Jun 21 03:55 strip_no/scipy/linalg/cython_blas.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 1034961 Jun 21 03:55 strip_no/scipy/linalg/cython_lapack.cp312-win_amd64.pyd ls -l strip_yes/scipy/linalg/*.pyd -rw-rw-rw-@ 1 rgommers staff 565248 Jun 21 08:45 strip_yes/scipy/linalg/_cythonized_array_utils.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 248832 Jun 21 08:45 strip_yes/scipy/linalg/_decomp_lu_cython.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 337920 Jun 21 08:45 strip_yes/scipy/linalg/_decomp_update.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 712704 Jun 21 08:45 strip_yes/scipy/linalg/_fblas.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 2003456 Jun 21 08:45 strip_yes/scipy/linalg/_flapack.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 884224 Jun 21 08:45 strip_yes/scipy/linalg/_interpolative.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 479232 Jun 21 08:45 strip_yes/scipy/linalg/_matfuncs_expm.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 253440 Jun 21 08:45 strip_yes/scipy/linalg/_matfuncs_sqrtm_triu.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 271360 Jun 21 08:45 strip_yes/scipy/linalg/_solve_toeplitz.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 276992 Jun 21 08:45 strip_yes/scipy/linalg/cython_blas.cp312-win_amd64.pyd -rw-rw-rw-@ 1 rgommers staff 527360 Jun 21 08:45 strip_yes/scipy/linalg/cython_lapack.cp312-win_amd64.pyd ```

The code comment explains that stripping was implemented to not make delvewheel choke, which it otherwise does because MinGW injects content (not known what that content is):

# To avoid DLL hell, the file name of libopenblas that's being vendored with
# the wheel has to be name-mangled. delvewheel is unable to name-mangle PYD
# containing extra data at the end of the binary, which frequently occurs when
# building with mingw.
# We therefore find each PYD in the directory structure and strip them.

This came from https://github.com/andyfaff/scipy/pull/28, which was the initial work on moving from multibuild to cibuildwheel. The comments on that PR don't explain it, and the code did not come from multibuild (that simply stripped everything by design).

I haven't checked what the extra content is yet; it's problem symbol info - https://stackoverflow.com/questions/32211913/is-there-any-reason-why-not-to-strip-symbols-from-executable contains a discussion on MinGW defaults.

It's also not yet clear to me if the problem is SciPy-specific, or a generic incompatibility between the MinGW defaults and delvewheel.

carlkl commented 4 months ago

stripping of pyd-files and DLLs should be done with strip --strip-unneeded. There is no reason to not strip mingw-w64 generated release binaries. Only debug binaries shouldn't be stripped of course.

rgommers commented 4 months ago

Thanks for the input @carlkl. A separate strip invocation is very clumsy; it requires unpacking and repacking an already-created wheel, which is bad. It's not necessary for GCC on Linux, nor for any other compiler on any platform AFAIK. On Linux, gcc -s at compile time has the same effect as strip --strip-unneeded afterwards I believe. Do you know if this is true for MinGW as well?

carlkl commented 4 months ago

I typically use -O2 for compiling. Alternatively one could use -Os instead (more or less the same but without loop unrolling i.e.). For linking together with stripping via gcc invocation one has to use: gcc -Wl,-strip-all or gcc -Wl,-s (it is the same) alternatively gcc -Wl,-strip-debug or gcc -Wl,-S to remove only debug information but leave the symbols

You can also take a look here: Strip debug symbols of wheels

carlkl commented 4 months ago

I never used gcc -s, but it seems to have the same effect as gcc -Wl,-s on a first glance.

In case you want to keep symbols to allow stack traces even for a release version I recommend to use gcc -Wl,-strip-debug

rgommers commented 4 months ago

I don't think we want to keep symbols. I think gcc -Wl,-strip-all is desirable here for the Windows builds. We can insert that flag when we're detecting that we build with MinGW.

I haven't had time to check yet, but I'd expect that there's more than just symbol tables. If you fold out the > Details in my first comment above, you'll see that the size differences are extreme. Some .pyd files increase by 2x or more in size; and the wheels end up ~15 MB larger. That'd be a lot of symbol tables.

carlkl commented 4 months ago

One could get one step further;

Compiling with -fdata-sections -ffunction-sections and linking with: -Wl,-gc-sections for improved dead code (unused code) elimination.

With adding -flto in the compiling and linking phase this could be even more effective.

Adding -fno-ident is also useful to remove numerous useless gcc identifications strings within the binaries.