huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.93k stars 966 forks source link

Accelerator unit tests are failing due to `TypeError: load_model() got an unexpected keyword argument 'device'` #2939

Closed byi8220 closed 3 months ago

byi8220 commented 3 months ago

Several unit tests are failing with the error messages similar to FAILED tests/test_accelerator.py::AcceleratorTester::test_save_load_model_use_safetensors - TypeError: load_model() got an unexpected keyword argument 'device' which appears to originate from https://github.com/huggingface/accelerate/blob/main/src/accelerate/checkpointing.py#L208

For the versions of the packages below, running make test or pytest tests/test_accelerator.py on the main branch fails on my machine:

$ conda list | grep torch
ffmpeg                    4.3                  hf484d3e_0    pytorch
libjpeg-turbo             2.0.0                h9bf148f_0    pytorch
pytorch                   2.3.1           py3.11_cuda12.1_cudnn8.9.2_0    pytorch
pytorch-cuda              12.1                 ha16c6d3_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
torchaudio                2.3.1               py311_cu121    pytorch
torchdata                 0.7.1.dev20240703+cpu          pypi_0    pypi
torchtriton               2.3.1                     py311    pytorch
torchvision               0.18.1              py311_cu121    pytorch
byi8220 commented 3 months ago

Whoops, might be my fault for having an out of date version of safetensors. Seems load_model can only take in device on release v0.4.3 (https://github.com/huggingface/safetensors/releases). I was on v0.4.2 but after updating it works as intended.

This also implies that accelerate depends on the versioning constraint safetensors>=0.4.3, which is quite annoying since as of writing this comment the latest version in conda is v0.4.2

Should setup.py require safetensors to be up to date now? Throwing out a oneline change if so: https://github.com/huggingface/accelerate/pull/2957