'CLIPEncoder' object has no attribute '_gradient_checkpointing_func'

TideDra commented 4 months ago

System Info

transformers version: 4.39.1
Platform: Linux-5.15.0-1041-azure-x86_64-with-glibc2.35
Python version: 3.10.14
Huggingface_hub version: 0.22.2
Safetensors version: 0.4.2
Accelerate version: 0.29.2
Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: bf16
- use_cpu: False
- debug: False
- num_processes: 8
- machine_rank: 0
- num_machines: 1
- gpu_ids: all
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
PyTorch version (GPU?): 2.1.2+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@amyeroberts

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

When using gradient_checkpointing, CLIPEncoder calls the self._gradient_checkpointing_func https://github.com/huggingface/transformers/blob/51bcadc10a569847b93a30dbe3a077037ae63bad/src/transformers/models/clip/modeling_clip.py#L628 However, it seems that this attribute is defined in PretrainedModel, while CLIPEncoder is not a subclass of PretrainedModel. This bug can be reproduced using the following code

from transformers import CLIPVisionModel
vision_tower = CLIPVisionModel.from_pretrained("openai/clip-vit-large-patch14-336")
print(vision_tower.vision_model.encoder._gradient_checkpointing_func)

which reports 'CLIPEncoder' object has no attribute '_gradient_checkpointing_func'

Expected behavior

Currently, I replace the self.__gradient_checkpointing_func directly with torch.utils.checkpoint.checkpoint, and it works when training with gradient checkpointing. However, I think this is not a perfect solution, since we need __gradient_checkpointing_func to handle the gradient_checkpointing_kwargs.

amyeroberts commented 4 months ago

Hi @TideDra, thanks for opening this issue!

You first have to enable gradient checkpointing for the model:

In [1]: from transformers import CLIPVisionModel
   ...: vision_tower = CLIPVisionModel.from_pretrained("openai/clip-vit-large-patch14-336")
   ...: vision_tower.gradient_checkpointing_enable({"use_reentrant": True})
   ...: vision_tower.vision_model.encoder._gradient_checkpointing_func
Out[1]: functools.partial(<function checkpoint at 0x15accd240>, use_reentrant=True)

TideDra commented 4 months ago

Thanks for your help!

huggingface / transformers