manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
166 stars 51 forks source link

Please don't "arch=native" #322

Open VictorEijkhout opened 3 months ago

VictorEijkhout commented 3 months ago
  1. I get this on installation:
/tmp/cch8TPRQ.s: Assembler messages:
/tmp/cch8TPRQ.s:28178: Error: no such instruction: `vmovw 161(%rbx),%xmm11'
/tmp/cch8TPRQ.s:28180: Error: no such instruction: `vmovw .LC38(%rip),%xmm12'
/tmp/cch8TPRQ.s:28188: Error: no such instruction: `vmovw %xmm13,161(%rbx)'
/tmp/cch8TPRQ.s:28189: Error: no such instruction: `vmovw %xmm13,%ecx'
  1. This goes away if I remove the arch=native flag. (Totally bizarre. How can you generate instructions that don't exist if you compile for native arch?)

  2. Using the native flag makes it impossible to compile for supercomputer clusters where the build node is often different from the compute node.

lgarrison commented 3 months ago

Is this on a Sapphire Rapids machine by any chance? I wonder if the compiler is generating AVX512-FP16 instructions that the assembler doesn't know what to do with.

In any case, yes, apologies for the -march=native difficulties. We're thinking about/working on a version with true multiple SIMD dispatch capabilities, and in the meantime, we've opted to use -march=native by default and document that we're doing so.

VictorEijkhout commented 3 months ago

You're right, I'm compiling on a Sapphire Rapids. (Stampede3 at TACC in case you're wondering: https://docs.tacc.utexas.edu/hpc/stampede3/#system-login )

manodeep commented 3 months ago

Using -march=native (and consequently disabling cross-compiling by default) is an unfortunate side-effect of making the typical user-experience smooth.

One option could be to add an "ISA_CFLAGS" variable (within common.mk that inherits from the env). System admins/cross-compilers can tweak the ISA setting by setting that ISA_CFLAGS on the command-line and then compiling. (And if the ISA_CFLAGS is undefined, then the usual -march=native/-mcpu=apple-m1/-xhost etc are used, as relevant)

What ISA(s) would you be compiling for?

VictorEijkhout commented 3 months ago

At the moment I have a bunch of Intels, AMD, and Grace/Grace.

Why do you do everything through the common.mk? I'm surprised to find a package these days that doesn't use at least autotools or preferably cmake.

manodeep commented 3 months ago

What are the ISAs for the Intels - Corrfunc uses at most the AVX512F + VL intrinsics? Separately, are the Grace ones AARCH64?

Regarding common.mk - it's mostly historical really. Back in 2012 when I started writing the Corrfunc predecessor, I didn't know autotools (and I still don't) and so went with the common.mk solution. Over the years, we have added a lot of checks into common.mk and now it is not worth the effort to migrate to a different solution.

If you are comfortable with autotools and would be willing/able to contribute a patch, that would be quite welcome!

johannesulf commented 1 month ago

By the way, I just encountered the same issue on my local cluster at American University. Removing -march=native solved the problem.

manodeep commented 1 month ago

@johannesulf If you remove the -march=native, then you will only have the fallback kernel and lose the performance boost from the SIMD kernels. Might be best to at least add the minimum ISA, say, -mavx2 (ideally the ISA of the runtime CPU) in place of the -march=native flag.