Closed justlike-prog closed 3 weeks ago
not sure if it would work but have you by any chance looked at using captions like "this is a photo of a ','.join(subset)"
where subset iterates over all subsets of your current classes? so then you'd have 2^4 classes instead of 4
I am attempting this now training on captions with multiple labels and then querying with single labels, and it works pretty badly compared to any normal multi-label classifier.
{'f1': 0.08291136675917679, 'precision': 0.07481833065257353, 'recall': 0.10588978264912757}
If I figure this out I will let you know.
Take a look at this paper: "DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations"
I struggled with this problem for a while and this approach is working for me.
@AmericanPresidentJimmyCarter did find a way to improve the multi-labelling performance?
No, I just trained multilabel classifiers instead and those worked.
@travellingsasa
You can do some sort of anti-text or placeholder text to do multi-label classification, ex:
your objective is checking in there is the presence of "red" in an image of a dress, then use:
["a red dress", "a dress"]
that will give you a probability distribution and you take the zero index
@travellingsasa
You can do some sort of anti-text or placeholder text to do multi-label classification, ex:
your objective is checking in there is the presence of "red" in an image of a dress, then use:
["a red dress", "a dress"]
that will give you a probability distribution and you take the zero index
How does that work? If the image contains neither your result will be essentially random. I think it only works if you have a multi-label classifier to identify a dress in the first place.
The concrete use case is a as following. I have the classes baby, child, teen, adult. My idea was to use similarity between text and image features (for text features I used the prompt 'there is at least one (c) in the photo', c being one of the 4 classes).
I went through quite a lot of examples, but I am running into the issue that the similarity scores are often very different for a fixed class or/and classes that appear might have a very similar threshold (like baby and child). For similarity scores I use the cosine similarity multiplied by 2.5 to stretch the score into the interval [0, 1] as is done in the CLIP Score paper.
Setting a threshold in that sense doesn't seem possible.
Does anyone have an idea for that? I feel quite stuck here, how I should proceed.