huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
15.84k stars 957 forks source link

Debug formatter for `Tensor` is confusing with > 1 GPU #2619

Open zackangelo opened 2 hours ago

zackangelo commented 2 hours ago

When running a model across several processes using NCCL, the debug formatter output will print the same ID for two GPUs:

GPU 0: (dev=Cuda(CudaDevice(DeviceId(1))), shape=[1, 128256], len=128256)
GPU 1: (dev=Cuda(CudaDevice(DeviceId(1))), shape=[1, 128256], len=128256)

It's confusing when looking at logs and trying to figure out which GPU is doing what.

This is because candle uses an atomic counter per-PID to assign a device ID: https://github.com/huggingface/candle/blob/00d8a0c178f588b6454c02e66b709917628c2bae/candle-core/src/cuda_backend/device.rs#L35-L39

Would it be a problem to include the CUDA device ordinal in the debug formatter? If not I'll open a PR.

LaurentMazare commented 2 hours ago

Yeah feel free to make a PR that would change it to something like CudaDevice(ordinal:id).