Open youssefadr opened 1 year ago
Yes so ideally you can add get_image_feature
and get_text_feature
to the Blip2ForConditionalGeneration class. For that you can refer to the original implementation .
@youssefadr let me know if you need any help in this PR, I am also in need of adding multimodal feature extraction from the Blip2Qformer
Hello, thanks for your message, I will tackle it this week 👍
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Sorry, I have been caught be in work. Will finalize the PR today!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Yes so ideally you can add
get_image_feature
andget_text_feature
to the Blip2ForConditionalGeneration class. For that you can refer to the original implementation .
Hi I want to know if this has been done?
because I am trying to use get_image_feature but I am getting this error AttributeError: 'Blip2ForConditionalGeneration' object has no attribute 'get_image_feature'
and I can not use Blip2Model because I have to use load_in_8bit
that come with Blip2ForConditionalGeneration
Hi, no this feature hasn't been added yet.
Hi, no this feature hasn't been added yet.
Thank you for your prompt response
I have the following questions I would appreciate your input:
Q1: is there any way to extract the feature of an image using BLIP-2 from hugging face checkpoints with load_in_8bit
?
Q2: is the feature extraction in this notebook https://github.com/salesforce/LAVIS/blob/main/examples/blip2_feature_extraction.ipynb works in the same way as get_image_feature
?
Q3: if I want to extract or convert an image into a Victor so I can use it by another model and do you have any recommendation of the best way to do this other than using Clip model because it did not give me a good result.
@youssefadr hi, lmk please if help is needed here, would love to give a try to push things forward. It would actually be my first contribution, but I'm quite familiar with the BLIP2 model.
Feature request
I would like to add the support for the zero-shot classification task using BLIP2, computing text-image similarities with the normalized embeddings, that would be accessed from BLIP2 feature extractor.
The idea is to enable calling the zero-shot classification pipeline using BLIP2, by implementing the
get_image_feature
andget_text_features
methods.I would love more guidance, if possible, on the criteria for accepting the PR.
Motivation
This is related to the following the discussion on this issue on the hub, and the comment left by @NielsRogge here https://huggingface.co/Salesforce/blip2-opt-2.7b/discussions/3#64cbe5e487ec96aa473a1f54 .
Your contribution
I would like to submit a PR to contribute for this feature.