huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.76k stars 941 forks source link

Add ignore_unexpected_keys arg to load_checkpoint_in_model() #2880

Closed Qubitium closed 3 months ago

Qubitium commented 3 months ago

What does this PR do?

Do not output gigantic amount of warning output to console when loading a quantized model where the layers are replaced by QuantLinear as in GPTQModel library (fork of AutoGPTQ). AutoGPTQ also has similar problem.

This is not a bug fix but an end-user usability issue. As example when GPTQModel.from_quantized() load a quantized Qwen2MoE model, where there is massive amount of layers and experts, you get like hundreds to thousands of virtual lines of unexpected_keys warning pushed to the console/log.

To fix this, I added ignore_unexpected_keys property to loader method. Not sure this was the best way to do it. Let me know if there is a better way around this.

This needs to be fixed because users think this is a bug. It is not a bug but the warning verbosity is so great that it becomes a bug in user's perspective. Imagine yourself as a user and presented with several hundreds screen lines of warning on your terminal.

For a quantized model, the warnings should not there in the first place and should only be printed in debug mode. This toggle allow that manual control.

TEST

@muellerzr @BenjaminBossan @SunMarc

Qubitium commented 3 months ago

@SunMarc

Sorry. There appears to be a bug on our end causing the correct flow of warnings. Closing this as unneeded.