Open etiennemlb opened 7 months ago
@etiennemlb - are you able to use JIT compile, that would then use the arch on the compute nodes? Or do you need to prebuild the ops for some reason?
Clearly, I could, and will use the "JIT" method.
Note that I used prebuilt DS and megatron was failing somewhere when it was looking for the prebuilt ops. And it was crashing "hard" no warning, no error, just a -4 return code.
If this march=native
has stood the test of time, it probably means I'm an outlier, but at least, I'd like a warning telling me I'm running on an architecture that has less "affordance" when it comes to architectural SIMD extensions. And preferably, a injection point to specify which arch I'd like to build for.
Compiling on compute node is also seen as a bad practice and software that rely on march=native
, seen as footgun because it breaks easily and complicates things on heterogeneous clusters.
Is your feature request related to a problem? Please describe. I work on a supercomputer where there is login and compute nodes. This architecture is typical in the HPC world. Login nodes are where you land with ssh and often where you compile, prepare your environment and launch compute task (via SLURM). There is no guarantee that the login node offers the same GPUs (if any) and CPUs than the one you'll find on the compute node.
How can one specify the CPU architecture with which we want to build the CPU ops. There already is PYTORCH_ROCM_ARCH for the GPUs, Id say we need something for CPU too instead of the
march=native
.