Open zmtbnv opened 11 months ago
BLIP-2 allows extracting Unimodal features like:
features_image = model.extract_features(sample, mode="image") features_text = model.extract_features(sample, mode="text") print(features_image.image_embeds.shape) # torch.Size([1, 32, 768]) print(features_text.text_embeds.shape) # torch.Size([1, 12, 768])
Is it possible to do the same with LLaVa?
same question
Any answers ?
Same question. Have you got a solution? Thanks
+1. Did anyone find a solution that is reasonably straightforward?
Question
BLIP-2 allows extracting Unimodal features like:
Is it possible to do the same with LLaVa?