Very negative values for matched img-text pairs got from coarse_grained-itm head

Hi! Thanks for this wonderful work! I tried to evaluate on flickr30k test set using your coarse_grained-itm approach. I got the score matrix for all image-text pairs from this function: https://github.com/microsoft/FIBER/blob/ca0f36bd7e1ad0ac02af2550042b1f259adaf5f9/coarse_grained/fiber/modules/objectives.py#L389 But I found that the score computed for a matched image-text pair is very negative. For example, like this: score = -6.652344

or very small score = 0.048279

it is quite wierd? and for example: caption1 = a black boy in orange and white trucks on playing in the sand caption2 = the white dog is running in the shallow water img is

This image is clearly matched with caption1, but the score of caption1 is -4.5 and the score for caption2 is -2.4, resulted in this image matcing more with caption2 since the score2 is less negative?

I would like to ask that is it because I got the score wrongly, is it normal score? Or do I need to do some further thing to the score matrix?

microsoft / FIBER

Very negative values for matched img-text pairs got from coarse_grained-itm head #19