Open pasha76 opened 1 month ago
Are you looking for image embeddings, or text embeddings?
I used the output_hidden_states=True, then you can access the text embeddings using output.hidden_states. You get the image embeddings by calling the vision encoder.
At least this is what I figured out.
what is the best way to extract embeddings as output instead of text...We are planning to fine tune the model and extract embeddings insetad of text caption. Any suggestions?