wisdomikezogwo / quilt1m

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
https://quilt1m.github.io/
MIT License
119 stars 9 forks source link

Text Context Length #27

Open Jiangbo-Shi opened 1 month ago

Jiangbo-Shi commented 1 month ago

Dear authors, Thanks for your great work. The maximum text context length for the CLIP text encoder is 77. However, the token length of several captions in quilt-1m is larger than 77. How can we utilize the CLIP text encoder to extract the caption features?

wisdomikezogwo commented 1 month ago

Hi,

For your needs you can try the PMB version of QuiltNet here: https://huggingface.co/wisdomik/QuiltNet-B-16-PMB which refers to PubmedBert, a BERT model of 256 context length pre-trained on PMC-15M and fine-tuned alongside the image tower on Quilt-1M.

Jiangbo-Shi commented 1 month ago

Thank you very much for your quick reply. Regarding the ViT-B-32|GPT-77 version of QuiltNet, how do you handle captions that exceed 77 in length? Did you implement a truncation operation?