Gradient checkpointing warning

BigDataMLexplorer commented 2 months ago

System Info

transformers version: 4.43.2
Platform: Linux-4.18.0-477.27.1.el8_8.x86_64-x86_64-with-glibc2.28
Python version: 3.9.4
Huggingface_hub version: 0.23.2
Safetensors version: 0.4.3
Accelerate version: 0.33.0
Accelerate config: not found
PyTorch version (GPU?): 2.3.1+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: Tesla V100-FHHL-16GB

Who can help?

@ArthurZucker @muellerzr @sunma

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Hi, I need help regarding gradient checkpoining settings for a fine tuning LLM model. I want to use it for less gpu memory usage. The system info lists the system information and library versions.

I am doing a text classification task using the AutoModelForSequenceClassification class with the Llama3 8b model. I load the model, then prepare the model for kbit trainig, use LoRA technique using LoraConfig and get_peft_model and use gradient_checkpointing=True in Huggingface Trainer.

Without gradient_checkpointing=True the training takes 9:40 hours and has about 84% accuracy. If I use gradient_checkpointing=True, the training takes about 4:47 hours and has only 70% accuracy. If I specify gradient_checkpointing=True in Trainer, I get these warnings:

env/lib/python3.9/site-packages/torch/utils/checkpoint.py:464: 
UserWarning: torch.utils.checkpoint: the use_reentrant 
parameter should be passed explicitly. In version 2.4 we 
will raise an exception if use_reentrant is not passed. 
use_reentrant=False is recommended, but if you need 
to preserve the current default behavior, you can pass 
use_reentrant=True. Refer to docs for more details on the
 differences between the two variants.

**warnings.warn(
env/lib/python3.9/site-packages/torch/utils/checkpoint.py:91: 
UserWarning: None of the inputs have requires_grad=True. 
Gradients will be None
  warnings.warn(**

Thanks for any help

Expected behavior

amyeroberts commented 2 months ago

Hi @BigDataMLexplorer, thanks for reporting. Could you try the solution suggested here: https://github.com/huggingface/transformers/issues/26969#issuecomment-1807831645

ArthurZucker commented 2 months ago

Hey, also a bit wird, as we have this code: https://github.com/huggingface/transformers/blob/e683c378fff90cb6c986e1f80684bc3e5ed3cda5/src/transformers/modeling_utils.py#L2362-L2365

which should always use re-entrant but allow you to set it to False:


model.gradient_checkpointing_enable({"use_reentrant":False})`

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers