pytorch: safetensors library hardcodes using CUDA if only device index is provided

dvrogozh commented 1 month ago

In relevance to:

https://github.com/huggingface/transformers/issues/31941

safetensors library hardcodes returning CUDA device if only device index is provided. This causes runtime errors running huggingface models with pipeline(device_map="auto") as noted in https://github.com/huggingface/transformers/issues/31941 (see this issue for repro steps). Hardcoding is happening here: https://github.com/huggingface/safetensors/blob/079781fd0dc455ba0fe851e2b4507c33d0c0d407/bindings/python/src/lib.rs#L296-L297

A possible solution might be to return the device returned by torch.device(N). Note however that this will work for non-CUDA devices only after the following change in pytorch will be merged:

https://github.com/pytorch/pytorch/pull/129119 This change modifies behavior for torch.device(N) to return current accelerator device instead of cuda device. This seems to be anticipated change by huggingface according to https://github.com/huggingface/accelerate/pull/2874#issuecomment-2204803710.

CC: @faaany @muellerzr @SunMarc @guangyey

dvrogozh commented 1 month ago

FYI, https://github.com/pytorch/pytorch/pull/129119 got merged, so solution which I outlined should now be possible.

dvrogozh commented 1 month ago

I have implemented a fix for this issue as I do see it. Please, help review https://github.com/huggingface/safetensors/pull/500.

Narsil commented 2 weeks ago

Closed by #509

huggingface / safetensors

pytorch: safetensors library hardcodes using CUDA if only device index is provided #499