haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
18.28k stars 1.99k forks source link

[Question] How to get the image embeddings and text embeddings from model during inference #951

Open Slinene opened 6 months ago

Slinene commented 6 months ago

Question

I noticed that model.generate can directly get the output, but how to get the image embeddings and text embeddings?

zty0510 commented 2 months ago

I meet with the same problem. Did you get a solution?