[Question] How to get the image embeddings and text embeddings from model during inference

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

https://llava.hliu.cc

Apache License 2.0

18.28k stars 1.99k forks source link

Open Slinene opened 6 months ago

Slinene commented 6 months ago

I noticed that model.generate can directly get the output, but how to get the image embeddings and text embeddings?

zty0510 commented 2 months ago

I meet with the same problem. Did you get a solution?