muzairkhattak / multimodal-prompt-learning

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
https://muzairkhattak.github.io/multimodal-prompt-learning/
MIT License
619 stars 43 forks source link

Error in TextEncoder class of maple.py #6

Closed muhammad-shahid0749 closed 1 year ago

muhammad-shahid0749 commented 1 year ago

Hello, I hope you are doing great, and thank you for your excellent work. I am trying to run your project but I am facing some error while running the script.

In your maple.py , forward method of TextEncoder class throws the error after calling self.transformer(combined) function.

combined = [x, compound_prompts_deeper_text, 0] # third argument is the counter which denotes depth of prompt outputs = self.transformer(combined) #this line generating the error

As you are passing "list" in self.transformer(combined) function so the following error is being generated. What could be the possible solution, I have also tried converting the combined variable to tensor but it also didn't work.

Error: File "/content/MAPLE/clip/model.py", line 158, in forward orig_type = x.dtype AttributeError: 'list' object has no attribute 'dtype'

muzairkhattak commented 1 year ago

Hi @muhammad-shahid0749,

Thank you for showing interest in our work.

As you are passing "list" in self.transformer(combined) function so the following error is being generated. What could be the possible solution, I have also tried converting the combined variable to tensor but it also didn't work.

Yes the list is being passed in the forward function but then the required tensor is retrieved from the list as shown below: https://github.com/muzairkhattak/multimodal-prompt-learning/blob/59a57c3f7631a4af49351666286d47e6c9da7910/clip/model.py#L290

The line at line 158 should get the tensor in the first place. For verification, I just checked the checked and ran the code again and it is working fine for me.

Regarding the problem you are facing, can you please specify the following?

Thank you and kind regards.

muhammad-shahid0749 commented 1 year ago

Thank You @muzairkhattak for your detailed response.

I am trying to embed your model with CLIP's Zero shot to increase zero shot clip accuracy. After getting the improved zero shot accuracy, I am planning to use the zero shot results for my model.

muzairkhattak commented 1 year ago

Hi @muhammad-shahid0749 ,

If I have understood correctly, you want to perform model ensembling and use the predictions of MaPLe and vanilla CLIP together to increase overall accuracy. Kindly confirm this?

If yes, then you may need to initialize a new model class for CLIP vanilla ZS in addition to currently present custom MaPLe CLIP as shown below: https://github.com/muzairkhattak/multimodal-prompt-learning/blob/59a57c3f7631a4af49351666286d47e6c9da7910/trainers/maple.py#L179

Once you have both the models, you can infer you images on both models and combine their predictions as the final classification scores. Let me know if you have any queries. Thank you.

muzairkhattak commented 1 year ago

I am closing this issue for now.

Feel free to open it in case the issue still exists.

Thank you.