Closed rishabh063 closed 3 months ago
Hi,
SigLIP applies sigmoid instead of softmax, that's where the name comes from ;)
isn't that just for training , in inference when getting probabilities softmax would represent probabilities better ,
alternatively you can remove that comment or just make sure that it sums to 100 .
Was annoying when trying to see probabilities not adding to 100%
@rishabh063 Would you like to open a PR to make this change?
With the softmax one ?
Applying a softmax to a model trained with sigmoid is technically not allowed - see this thread for some info from the authors.
cc @merveenoyan who had a trick to normalize the probabilities. But with sigmoid, you need to interpret them for each image-text pair independently (just like in multi-label classification)
@rishabh063 I think updating the comment - just to indicate the values won't add up to 1 - would make most sense
The thread shared by @NielsRogge has a re-scaling these value example by @merveenoyan .
Ig thats the best approach. Any one of you should make the pr and have a little explanation also mentioned.
Most people are familiar with clip but not with siglip .
@rishabh063 Yes, that's true. You'll notice that later in the discussion @rwightman points out that this is unlikely to be properly calibrated. Rather than make an assumption about the data/task c.f. this comment it's better just to flag to users that the outputs might not add up to 1.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
documentation link :- https://huggingface.co/docs/transformers/en/model_doc/siglip
Reproduction
Sample code :-
Expected behavior
this line should be replaced
with