Closed g8a9 closed 2 years ago
Hey @g8a9
It's not possible to load FlaxHybridCLIP
into FlaxCLIP
since the module structure is different as the hybrid can use any pre-trained text and vision models.
The hybrid clip model will soon be officially supported in transformers
(see #13511), we are now calling it VisionTextDualEncoder
. This will be available in both torch and flax. Stay tuned!
Nice to know, thanks!
Do you think that once the VisionTextDualEncoder
is out, we will be able to load our checkpoint with it?
(actually, my final goal is to have access to our two fine-tuned encoders, ViT and BERT, in pytorch)
Do you think that once the VisionTextDualEncoder is out, we will be able to load our checkpoint with it?
The module structure is pretty much similar, so yes! If not I'll share a script to convert the old hybrid clip weights to this new class.
Hey @g8a9
clip-italian (or any hybrid clip) model can now be loaded using the new VisionTextDualEncoderModel
. Converting from flax to pt should also work.
from transformers import FlaxVisionTextDualEncoderModel, VisionTextDualEncoderModel
# `logit_scale` can be initialized using `config.logit_scale_init_value` attribute
model = FlaxVisionTextDualEncoderModel.from_pretrained("clip-italian/clip-italian", logit_scale_init_value=1)
model.save_pretrained("clip-italin")
model_pt = VisionTextDualEncoderModel.from_pretrained("clip-italian", from_flax=True)
Let me know if this works for you and if you see any discrepancies in the result. I would like to use clip-Italian to feature this new model class :)
Do you think you could push the PT checkpoint and the processor (tokenizer/feature-extractor) for clip-italian
? Going to use this model in doc examples :)
Hey @g8a9
clip-italian (or any hybrid clip) model can now be loaded using the new
VisionTextDualEncoderModel
. Converting from flax to pt should also work.from transformers import FlaxVisionTextDualEncoderModel, VisionTextDualEncoderModel # `logit_scale` can be initialized using `config.logit_scale_init_value` attribute model = FlaxVisionTextDualEncoderModel.from_pretrained("clip-italian/clip-italian", logit_scale_init_value=1) model.save_pretrained("clip-italin") model_pt = VisionTextDualEncoderModel.from_pretrained("clip-italian", from_flax=True)
Let me know if this works for you and if you see any discrepancies in the result. I would like to use clip-Italian to feature this new model class :)
this worked for me. Thanks for the solution.
Environment info
transformers
version: 4.12.2Who can help
@patil-suraj @patrickvonplaten
Information
During the last Flax/JAX Community Week we trained a fine-tuned version of CLIP for the Italian language. We used the provided script, so we trained a FlaxHybridCLIP model with Open AI's ViT and
"dbmdz/bert-base-italian-xxl-uncased"
BERT as encoders.Now, I'm trying to use that model with the transformers' official API classes, either FlaxCLIPModel or CLIPModel (my final goal would be to port it to pytorch and publish it to the hub). However, I am having a hard time loading our weights into any of the two.
I tried different workarounds (see below) but none of them seems to be working.
To reproduce
I assume these imports
Steps to reproduce the behavior:
Output
You are using a model of type hybrid-clip to instantiate a model of type clip. This is not supported for all configurations of models and can yield errors. INFO:absl:Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: INFO:absl:Unable to initialize backend 'gpu': NOT_FOUND: Could not find registered platform with name: "cuda". Available platform names are: Interpreter Host INFO:absl:Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available. WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
ValueError Traceback (most recent call last)