Closed SuleBai closed 1 year ago
For issue1 and 2 you can refer to this table (from the revision manuscript under review):
For the open-vocabulary segmentation, you just need to use argmax for the output of the new path. The evaluation code will be released after the acceptance.
Hi, thanks for your great work.
I am interested in the details about open-vocab segmentation and I have few questions regarding this task.
In the
architecture surgery
, I'm wondering whether the prediction for segmentation comes from the original path or the new path? Additionally, which features are used in thefeature surgery
? The paper said "Note that Eq. 9 is specifically designed for the explainability task", but I think the segmentation should use this too?And it confused me in the [code](https://github.com/xmed-lab/CLIP_Surgery/blob/e346359d67e8fc4fe301467914151316d3982661/clip/clip_surgery_model.py#L349C36-L349C36)
Why do you preserve the [cls] token in the original_path? If my understanding was right, the [cls] token in the original_path is not influenced by the new_path. So for the multi-label recognition task, the
architecture surgery
would be useless?Could you give more details? And it would be of great help if you could release the code for the open-vocabulary segmentation.
Thanks again for your work!