Closed regainOWO closed 2 months ago
They're simple pretrained CLIP models on the LAION-2B dataset, we only use the transformer blocks and use the proposed modality-specific tokenizers in our paper among these modalities.
Thanks for your reply!
what about Image_Meta-Transformer-B16
, Is it obtained by training Meta-Transformer-B16
as ViT on ImageNet-1K dataset?
It's a finetuned Meta-Transformer weight on the image datasets.
Hi! Thanks for your great contributions! I want to know how the
Meta-Transformer-B16
andMeta-Transformer-L14
model files are trained?. I found its the Transformer Block weights.