Open zackangelo opened 2 hours ago
When running a model across several processes using NCCL, the debug formatter output will print the same ID for two GPUs:
GPU 0: (dev=Cuda(CudaDevice(DeviceId(1))), shape=[1, 128256], len=128256) GPU 1: (dev=Cuda(CudaDevice(DeviceId(1))), shape=[1, 128256], len=128256)
It's confusing when looking at logs and trying to figure out which GPU is doing what.
This is because candle uses an atomic counter per-PID to assign a device ID: https://github.com/huggingface/candle/blob/00d8a0c178f588b6454c02e66b709917628c2bae/candle-core/src/cuda_backend/device.rs#L35-L39
Would it be a problem to include the CUDA device ordinal in the debug formatter? If not I'll open a PR.
Yeah feel free to make a PR that would change it to something like CudaDevice(ordinal:id).
CudaDevice(ordinal:id)
When running a model across several processes using NCCL, the debug formatter output will print the same ID for two GPUs:
It's confusing when looking at logs and trying to figure out which GPU is doing what.
This is because candle uses an atomic counter per-PID to assign a device ID: https://github.com/huggingface/candle/blob/00d8a0c178f588b6454c02e66b709917628c2bae/candle-core/src/cuda_backend/device.rs#L35-L39
Would it be a problem to include the CUDA device ordinal in the debug formatter? If not I'll open a PR.