how can I get the image and text embeddings for another task, and what size are these embeddings? Here is what I know:
Here is the vision output shape: torch.Size([1, 576, 4096]
Here is the text output shape: torch.Size([1, 128, 4096])
I just got the dimension of both vision and text embeddings from the model configuration and vision embedding are set to 4096 as per hidden_size. And text embeddings are set to 1024 as per mm_hidden_size.
but the text output shape last dimension and the mm_hidden_size value (1024) do not match up. Also, 576 X 4096 seems very large.
Question
how can I get the image and text embeddings for another task, and what size are these embeddings? Here is what I know: Here is the vision output shape: torch.Size([1, 576, 4096] Here is the text output shape: torch.Size([1, 128, 4096])
I just got the dimension of both vision and text embeddings from the model configuration and vision embedding are set to 4096 as per hidden_size. And text embeddings are set to 1024 as per mm_hidden_size.
but the text output shape last dimension and the mm_hidden_size value (1024) do not match up. Also, 576 X 4096 seems very large.