CLIP config inconsistency issue

zhiqiangdon commented 7 months ago

System Info

Python Version: 3.9.18
Operating System: Linux
Platform Machine: x86_64
Platform Version: #91~18.04.1-Ubuntu SMP Sun Aug 14 01:24:43 UTC 2022
Pytorch Version: 2.0.1+cu117
transformers version: 4.38.2

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

from transformers import AutoConfig
config1 = AutoConfig.from_pretrained("openai/clip-vit-large-patch14-336")
config2 = AutoConfig.from_pretrained("openai/clip-vit-base-patch32")
print(config1)
print(config2)

Expected behavior

The details of config1 and config2 are quite different. For example, config2 doesn't have the image size information.

Config1:

CLIPConfig {
  "_name_or_path": "openai/clip-vit-large-patch14-336",
  "architectures": [
    "CLIPModel"
  ],
  "initializer_factor": 1.0,
  "logit_scale_init_value": 2.6592,
  "model_type": "clip",
  "projection_dim": 768,
  "text_config": {
    "dropout": 0.0,
    "hidden_size": 768,
    "intermediate_size": 3072,
    "model_type": "clip_text_model",
    "num_attention_heads": 12,
    "projection_dim": 768
  },
  "torch_dtype": "float32",
  "transformers_version": "4.38.2",
  "vision_config": {
    "dropout": 0.0,
    "hidden_size": 1024,
    "image_size": 336,
    "intermediate_size": 4096,
    "model_type": "clip_vision_model",
    "num_attention_heads": 16,
    "num_hidden_layers": 24,
    "patch_size": 14,
    "projection_dim": 768
  }
}

Config2:

CLIPConfig {
  "_name_or_path": "openai/clip-vit-base-patch32",
  "architectures": [
    "CLIPModel"
  ],
  "initializer_factor": 1.0,
  "logit_scale_init_value": 2.6592,
  "model_type": "clip",
  "projection_dim": 512,
  "text_config": {
    "bos_token_id": 0,
    "dropout": 0.0,
    "eos_token_id": 2,
    "model_type": "clip_text_model"
  },
  "transformers_version": "4.38.2",
  "vision_config": {
    "dropout": 0.0,
    "model_type": "clip_vision_model"
  }
}

In transformers 4.31.0, clip base and clip large configs have more extensive information. In 4.38.2, the information becomes incomplete and inconsistent.

amyeroberts commented 7 months ago

Hi @zhiqiangdon,

When configs are saved out, only the parameters which are different from the default config values are saved. So in the case of config 1 - the image size is saved because it's 336, which is different from the default of 224, which config 2 uses.

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers