Open LilyDaytoy opened 10 months ago
Hi, thanks for the question! can you reproduce the evaluation results? the logits are fed to a normalization layer during training and it could be hard to tell if they make sense by just looking at the values https://github.com/microsoft/FIBER/blob/ca0f36bd7e1ad0ac02af2550042b1f259adaf5f9/coarse_grained/fiber/modules/objectives.py#L61C24-L61C24
Hi! Thanks for this wonderful work! I tried to evaluate on flickr30k test set using your coarse_grained-itm approach. I got the score matrix for all image-text pairs from this function: https://github.com/microsoft/FIBER/blob/ca0f36bd7e1ad0ac02af2550042b1f259adaf5f9/coarse_grained/fiber/modules/objectives.py#L389 But I found that the score computed for a matched image-text pair is very negative. For example, like this: score = -6.652344
or very small score = 0.048279
it is quite wierd? and for example: caption1 = a black boy in orange and white trucks on playing in the sand caption2 = the white dog is running in the shallow water img is
This image is clearly matched with caption1, but the score of caption1 is -4.5 and the score for caption2 is -2.4, resulted in this image matcing more with caption2 since the score2 is less negative?
I would like to ask that is it because I got the score wrongly, is it normal score? Or do I need to do some further thing to the score matrix?