microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.55k stars 4.14k forks source link

[REQUEST] How can one specify the CPU architecture to target. #5451

Open etiennemlb opened 7 months ago

etiennemlb commented 7 months ago

Is your feature request related to a problem? Please describe. I work on a supercomputer where there is login and compute nodes. This architecture is typical in the HPC world. Login nodes are where you land with ssh and often where you compile, prepare your environment and launch compute task (via SLURM). There is no guarantee that the login node offers the same GPUs (if any) and CPUs than the one you'll find on the compute node.

How can one specify the CPU architecture with which we want to build the CPU ops. There already is PYTORCH_ROCM_ARCH for the GPUs, Id say we need something for CPU too instead of the march=native.

loadams commented 7 months ago

@etiennemlb - are you able to use JIT compile, that would then use the arch on the compute nodes? Or do you need to prebuild the ops for some reason?

etiennemlb commented 7 months ago

Clearly, I could, and will use the "JIT" method.

Note that I used prebuilt DS and megatron was failing somewhere when it was looking for the prebuilt ops. And it was crashing "hard" no warning, no error, just a -4 return code.

If this march=native has stood the test of time, it probably means I'm an outlier, but at least, I'd like a warning telling me I'm running on an architecture that has less "affordance" when it comes to architectural SIMD extensions. And preferably, a injection point to specify which arch I'd like to build for.

Compiling on compute node is also seen as a bad practice and software that rely on march=native, seen as footgun because it breaks easily and complicates things on heterogeneous clusters.