Closed PeterVennerstrom closed 2 months ago
Hi, IP-CLIP performs a repeat operation on the cls token. In the default setting, the Proposal Generator generates 100 masks, so the cls token needs to repeat 99 times. That is, mask.shape[-1] - xatten.shape[0] == 99
in line 285.
There are only 3 masks in your code, so mask.shape[-1] - xatten.shape[0] == 2
and raises the AssertionError. You can comment out this assertion to avoid the error, and this will not affect the correctness of the code.
Getting an assertion error in third_party/CLIP/clip/model.py line 285. The code expects a 99 offset between mask.shape[-1] and xatten.shape[0].
I'm getting:
AssertionError: (903, 901)
Here is my slightly adjusted code based on demo code shared in another issue:
Here is a link to the image and masks used.
With the assertion commented out, the code runs and reasonable results for the image and masks are returned.
Appreciate your thoughts on the assertion error. Thanks!