vikhyat / moondream

tiny vision language model
https://moondream.ai
Apache License 2.0
5.78k stars 471 forks source link

embeddings instead of text output #139

Open pasha76 opened 1 month ago

pasha76 commented 1 month ago

what is the best way to extract embeddings as output instead of text...We are planning to fine tune the model and extract embeddings insetad of text caption. Any suggestions?

vikhyat commented 1 month ago

Are you looking for image embeddings, or text embeddings?

7AtAri commented 1 month ago

I used the output_hidden_states=True, then you can access the text embeddings using output.hidden_states. You get the image embeddings by calling the vision encoder.

At least this is what I figured out.