sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 480 forks source link

Do not require AVX when building with SAGE_FAT_BINARY #32434

Open vbraun opened 3 years ago

vbraun commented 3 years ago

Setting CFLAGS to -mno-avx -mno-avx2 -mno-bmi2 should help the produced binaries work on older processors.

Move from https://github.com/sagemath/binary-pkg/issues/31

CC: @culler @dimpase @embray @kiwifb @kliem @orlitzky @mkoeppe @slel @kwankyu

Component: build

Author: Samuel Lelièvre

Branch/Commit: u/soehms/adapt_fat_binary_32434 @ 3c61bf2

Reviewer: Sebastian Oehms, ...

Issue created by migration from https://trac.sagemath.org/ticket/32434

slel commented 3 years ago
comment:1

Thanks for opening this ticket. What files need changing?

Ran git grep SAGE_FAT_BINARY and git grep fat-binary; still clueless.

Probably a subset of these files:

$ git grep SAGE_FAT_BINARY | cut -f 1 -d : | sort | uniq
build/bin/sage-build-env-config.in
build/pkgs/argon2_cffi/spkg-install.in
build/pkgs/atlas/configuration.py
build/pkgs/atlas/patches/atlas-config
build/pkgs/curl/spkg-install.in
build/pkgs/ecm/SPKG.rst
build/pkgs/ecm/spkg-install.in
build/pkgs/fflas_ffpack/spkg-install.in
build/pkgs/gcc/spkg-configure.m4
build/pkgs/gf2x/spkg-install.in
build/pkgs/givaro/spkg-install.in
build/pkgs/gmp/spkg-install.in
build/pkgs/libffi/spkg-install.in
build/pkgs/linbox/spkg-install.in
build/pkgs/m4ri/spkg-install.in
build/pkgs/mpir/spkg-install.in
build/pkgs/ntl/spkg-install.in
build/pkgs/numpy/spkg-install.in
build/pkgs/openblas/spkg-install.in
build/pkgs/primecount/spkg-install.in
build/pkgs/r/spkg-install.in
configure.ac
docker/Dockerfile
src/doc/en/installation/source.rst
src/sage_setup/command/sage_build_ext.py
slel commented 3 years ago

Description changed:

--- 
+++ 
@@ -1,5 +1,4 @@
 Setting `CFLAGS to -mno-avx -mno-avx2 -mno-bmi2` should help the produced binaries work on older processors.
-

 Move from https://github.com/sagemath/binary-pkg/issues/31
orlitzky commented 3 years ago
comment:2

Warning: we shouldn't add anything to CFLAGS without first checking that the user's compiler supports it.

IMO a better long-term solution is to ensure that all of our SPKGs respect the user's CFLAGS, CXXFLAGS, et cetera, and don't add unnecessary flags to them on-the-fly. All Gentoo packages do that, so you can likely steal patches/workarounds from our package repository or from the sage-on-gentoo overlay.

slel commented 3 years ago
comment:3

Would this change in src/sage_setup/command/sage_build_ext.py help in any way?

 def check_flags(self):
     """
     Sanity check the compiler flags used to build the extensions
     """
     forbidden = None
     if os.environ.get("SAGE_FAT_BINARY") == "yes":
         # When building with SAGE_FAT_BINARY=yes, we should not
         # enable CPU features which do not exist on every CPU.
         # Such flags usually come from other libraries adding the
         # flags to the pkgconfig configuration.  So if you hit these
         # errors, the problem is most likely with some external
         # library and not with Sage.
         import re
-        forbidden = re.compile(r"-march=|-mpcu=|-msse3|-msse4|-mpopcnt|-mavx")
+        forbidden = re.compile(r"-march=|-mpcu=|-msse3|-msse4|-mpopcnt"
+                               r"|-mavx|-mavx2|-mbmi2")

     if forbidden is not None:
         errors = 0
         for ext in self.extensions:
             flags = ext.extra_compile_args
             for flag in flags:
                 if forbidden.match(flag):
                     log.error("%s uses forbidden flag '%s'", ext.name, flag)
                     errors += 1
         if errors:
             raise RuntimeError("forbidden flags used")
orlitzky commented 3 years ago
comment:4

Replying to @slel:

Would this change in src/sage_setup/command/sage_build_ext.py help in any way?

I think it strips those flags when building the sage library (which is the right thing to do in any case), but probably doesn't solve the issue in the original bug report because it's most likely coming from some SPKG and not the sage library itself.

OpenBLAS is a prime suspect, since 0.3.13 will use certain machine-specific instructions when no TARGET is specified: https://github.com/xianyi/OpenBLAS/issues/3056

On sage-support, Nathan said that the mac app has to set TARGET=CORE2 to support macs from 2013. I think to fix the same problem, we need to decide how old of a computer should be able to run the sage binaries, and set a TARGET appropriately when SAGE_FAT_BINARY is used. (Of course, this is all based on the guess that OpenBLAS is to blame. Can anyone reproduce the problem?) If no one else has an opinion, TARGET=CORE2 sounds fine to me. Building from source is always an option.

fchapoton commented 2 years ago
comment:5

bump to 9.6

williamstein commented 2 years ago
comment:7

I would just like this ticket to get prioritized more highly. It does impact users and is probably easy to fix. That's all.

mkoeppe commented 2 years ago
comment:8

Won't happen without people who need SAGE_FAT_BINARY doing the work and setting up tests.

williamstein commented 2 years ago
comment:9

It occurs to me that suggestions like use "CFLAGS to -mno-avx -mno-avx2 -mno-bmi2" aren't really in the spirit of "SAGE_FAT_BINARY". They are in the spirit of "make a binary that is very slow but works on a maximum amount of hardware". Maybe we can't have it both ways, but simply disabling AVX, etc. is probably going to result in a much slower Sage binary.

mkoeppe commented 2 years ago
comment:10

See #32363 "Rename configure option --enable-fat-binary to --enable-portable-binary"

mkoeppe commented 2 years ago
comment:11

openblas has dynamic_arch, numpy has --cpu-baseline. I'm not aware of a more general mechanism to do magic runtime specialization to arch variants.

mkoeppe commented 2 years ago
comment:12

Short of compiler/linker support for such a thing, I'd think the best that a binary vendor could do is to make several builds with separate --prefix (SAGE_LOCAL) and then dispatch at runtime. Comes at a cost of longer build times and a few gigabytes of space.

mkoeppe commented 2 years ago
comment:13

Said vendor could also throw in a "debug" build while at it.

soehms commented 2 years ago
comment:14

Replying to @orlitzky:

Replying to @slel:

Would this change in src/sage_setup/command/sage_build_ext.py help in any way?

I think it strips those flags when building the sage library (which is the right thing to do in any case), but probably doesn't solve the issue in the original bug report because it's most likely coming from some SPKG and not the sage library itself.

I can't say anything about that, but I confirm that Samuel's suggestion fixes an at least related issue that I had on the device I described in this sage-devel post (output of lshw):

     *-cpu
          Beschreibung: CPU
          Produkt: Intel(R) Atom(TM) CPU N455   @ 1.66GHz
          Hersteller: Intel Corp.
          Physische ID: 1d
          Bus-Informationen: cpu@0
          Version: 6.12.10
          Seriennummer: 0001-06CA-0000-0000-0000-0000
          Steckplatz: CPU
          Größe: 1662MHz
          Kapazität: 1666MHz
          Breite: 64 bits
          Takt: 667MHz
          Fähigkeiten: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon pebs bts cpuid aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm cpufreq
          Konfiguration: cores=1 enabledcores=1 id=0 threads=2

Here my observation:

  1. I still could build one of the first beta of 9.6 (beta3 IIRC) on this computer without giving any options to configure.
  2. The build of 9.7.beta0 breaks with SIGILL in sagemath_doc_html-none and on startup even after setting --enable-fat-binary.
  3. The build of 9.7.beta0 runs without any problems after doing the changes suggested by Samuel.

So, my opinion is that we should do this fix since it is surely an improvement. If it really would not help for the original issue then that could be a follow up ticket (or the other way around).

soehms commented 2 years ago

Branch: u/soehms/adapt_fat_binary_32434

soehms commented 2 years ago
comment:16

According to my previous comment I push a branch containing Samuel's suggestion.


New commits:

3c61bf232434: initial
soehms commented 2 years ago

Reviewer: Sebastian Oehms, ...

soehms commented 2 years ago

Author: Samuel Lelievre

soehms commented 2 years ago

Commit: 3c61bf2

soehms commented 2 years ago

Changed author from Samuel Lelievre to Samuel Lelièvre

mkoeppe commented 2 years ago
comment:18

Another affected CPU: see #33671 comment:243

mkoeppe commented 2 years ago

Changed reviewer from Sebastian Oehms, ... to Sebastian Oehms, ...

mkoeppe commented 2 years ago
comment:19

Note that the module that is modified here, sage_setup.command.sage_build_ext, is not used when configure --enable-editable is in use (the default since #32406)

Also, even with configure --disable-editable, it's not clear to me that it would enough to disable these flags in Sage library or whether it needs to be done on the level of the whole distribution. (For example by setting CFLAGS etc. at configure time as the ticket description says.)

soehms commented 1 year ago
comment:21

FWIW: I made another try on the device I described in comment14 after installing LinuxMint 21 (vanessa) which is build on Ubuntu 22.04 (jammy). I did that on a fresh git clone of Sage 9.8.beta3 merged with the branch of the ticket. On a first step, I wanted to test with --enable-fat-binary but without --disable-editable. Accidentally, I typed --enable-sage-fat-binary instead of --enable-fat-binary. I guess that's the same as giving no option. Unexpectedly, it ends up with a pretty well working Sage.

Furthermore I tried two binary installations of stable 9.7. Using the docker image sagemath/sagemath:9.7 everything works fine, too, while if I use conda I'm getting an illegal instruction error on startup of Sage, as it was according to comment 14 with 9.7.beta0 after building from source under LinuxMint 19.3.

Apart from the latter fact it seems that the issue has been fixed for my device. I have no idea what could have caused that: The upgrade of the OS or changes in Sage?