wisdomikezogwo / quilt1m

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
https://quilt1m.github.io/
MIT License
138 stars 8 forks source link

Image-to-text generation #17

Closed anabiasuhail closed 7 months ago

anabiasuhail commented 10 months ago

Can you please guide, How I can use quilt1 for image to text generation. Like I input an image, and it generates the text. Do I need to use LLaVA and BLIP like modes where I assign the weights of the quilt1m and use it for text description generation. As the API mentioned at the hugging face is only for zero short classification. and I could not find the Text retrieval code in GitHub repo. Moreover, I also tried blip, but got compatibility issues. Thanks.

wisdomikezogwo commented 10 months ago

Hi,

To use QuiltNet for retrieval, I'd suggest you use https://github.com/LAION-AI/CLIP_benchmark, which we also leveraged in the evaluation.

Also for Text-generation tasks we recently released a new work called Quilt-LLAVA where we essentially use the image tower of Quiltnet in training Quilt-LLAVA. Fortunately, The model should be either tonight or tomorrow night. With that Quilt-LLAVA, you could conduct research with an LMM tuned for histopathology. Please read the paper when you have some time.