Open Jiangbo-Shi opened 1 month ago
Hi,
For your needs you can try the PMB version of QuiltNet here: https://huggingface.co/wisdomik/QuiltNet-B-16-PMB which refers to PubmedBert, a BERT model of 256 context length pre-trained on PMC-15M and fine-tuned alongside the image tower on Quilt-1M.
Thank you very much for your quick reply. Regarding the ViT-B-32|GPT-77 version of QuiltNet, how do you handle captions that exceed 77 in length? Did you implement a truncation operation?
Dear authors, Thanks for your great work. The maximum text context length for the CLIP text encoder is 77. However, the token length of several captions in quilt-1m is larger than 77. How can we utilize the CLIP text encoder to extract the caption features?