AssertionError when converting openai clip's weight to hf

wingvortex commented 1 year ago

System Info

transformers version: 4.26.1
Platform: Linux-4.18.0-305.25.1.el8_4.x86_64-x86_64-with-redhat-8.5-Ootpa
Python version: 3.7.10
Huggingface_hub version: 0.12.0
PyTorch version (GPU?): 1.13.1+cu117 (True)
Tensorflow version (GPU?): 2.8.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.4.1 (gpu)
Jax version: 0.3.15
JaxLib version: 0.3.15

Who can help?

@amyeroberts

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Hi, I'm trying to convert huggingface's clip weight back to openai's because I need to adapt a finetuned model, but it seems there is no such script available. Luckily I found one that converts clip from openai to hugginface here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/clip/convert_clip_original_pytorch_to_hf.py

So I start with this script. But When I run it:

python convert_clip_original_pytorch_to_hf.py --checkpoint_path 'path/to/ViT-B-32.pt' --pytorch_dump_folder_path './'

I got the following error:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/test/convert_clip_original_pytorch_to_hf.py:1 │ │ 48 in │ │ │ │ 145 │ parser.add_argument("--config_path", default=None, type=str, help="Path to hf config │ │ 146 │ args = parser.parse_args() │ │ 147 │ │ │ ❱ 148 │ convert_clip_checkpoint(args.checkpoint_path, args.pytorch_dump_folder_path, args.co │ │ 149 │ │ │ │ /home/anaconda3/envs/hf/lib/python3.7/site-packages/torch/autograd │ │ /grad_mode.py:27 in decorate_context │ │ │ │ 24 │ │ @functools.wraps(func) │ │ 25 │ │ def decorate_context(*args, *kwargs): │ │ 26 │ │ │ with self.clone(): │ │ ❱ 27 │ │ │ │ return func(args, **kwargs) │ │ 28 │ │ return cast(F, decorate_context) │ │ 29 │ │ │ 30 │ def _wrap_generator(self, func): │ │ │ │ /home/test/convert_clip_original_pytorch_to_hf.py:1 │ │ 36 in convert_clip_checkpoint │ │ │ │ 133 │ pt_logits_per_image, pt_logits_per_text = pt_model(pixel_values, input_ids) │ │ 134 │ │ │ 135 │ assert torch.allclose(hf_logits_per_image, pt_logits_per_image, atol=1e-3) │ │ ❱ 136 │ assert torch.allclose(hf_logits_per_text, pt_logits_per_text, atol=1e-3) │ │ 137 │ │ │ 138 │ hf_model.save_pretrained(pytorch_dump_folder_path) │ │ 139 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

Expected behavior

A hf's clip weight should be generated from original openai's by running this script.

amyeroberts commented 1 year ago

Hi @wingvortex, thanks for raising this issue.

~~The trackback in the issue description doesn't contain the error message - could you share that please?~~ Scratch that: I see it's in the torch.allclose assert

For the checkpoints being converted, could you confirm which of the ViT-B-32 checkpoints are being used e.g. ('ViT-B-32', 'laion400m_e31'),

wingvortex commented 1 year ago

@amyeroberts It's a checkpoint downloaded by clip.load('ViT-B/32')

amyeroberts commented 1 year ago

@wingvortex Thanks again for reporting and the additional info. I managed to track it down to an indexing error in the conversion script, which should be resolved when #22776 is merged.

Yufang-Liu commented 8 months ago

When I upgrade transformers to 4.39, the script still got the assertion error when convert 'ViT-B-32'. Version 4.26.1 is ok.

amyeroberts commented 8 months ago

Hi @Yufang-Liu, could you share a producible snippet to show how you're calling the conversion script, as well as share the checkpoint being converted and a full traceback of the error raised?

Yufang-Liu commented 8 months ago

with transformers==4.39.0, run with python .\convert_clip_original_pytorch_to_hf.py --checkpoint_path ViT-B/32 --pytorch_dump_folder_path ./, the error is:

  File "C:\Users\Yufang Liu\Downloads\convert_clip_original_pytorch_to_hf.py", line 148, in <module>
    convert_clip_checkpoint(args.checkpoint_path, args.pytorch_dump_folder_path, args.config_path)
  File "D:\anaconda3\envs\pt\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Yufang Liu\Downloads\convert_clip_original_pytorch_to_hf.py", line 135, in convert_clip_checkpoint
    assert torch.allclose(hf_logits_per_image, pt_logits_per_image, atol=1e-3)
AssertionError```

with transformers==4.30.0, the error is gone. I also tried 4.32.0, same error.

amyeroberts commented 8 months ago

Ah, OK. This is a different issue to the one originally raised, and looks like a model regression. Could you open a new issue including these details? This will help us better track and see when something has been resolved

Yufang-Liu commented 8 months ago

ok

huggingface / transformers