salaniz / pycocoevalcap

Python 3 support for the MS COCO caption evaluation tools
Other
300 stars 84 forks source link

CIDEr score is 0 while all other metrics are normal #18

Open mlching opened 1 year ago

mlching commented 1 year ago

I'm currently using the pycocoevalcap package to evaluate the performance of my image captioning model. I've noticed that the CIDEr score is consistently 0 for all of my model's generated captions, while all other metrics (BLEU, METEOR, SPICE and ROUGE) are normal.

I have tried to run the evaluation on each image separately, but the situation remains the same. The CIDEr score is always 0.

I'm not sure what could be causing this issue, as the other metrics seem to be working correctly. Can anyone help me figure out why the CIDEr score is not being computed correctly?

Thanks in advance for your help!

suraj-nair-tri commented 10 months ago

Were you able to resolve this issue? I am experiencing the same problem

mlching commented 10 months ago

No, I haven't been able to resolve the issue either. I'm still experiencing the same problem.

salaniz commented 8 months ago

Could you provide a minimal code example to reproduce this issue? Do you get normal values if you try to run the example from this repository: example/coco_eval_example.py?

theodpzz commented 7 months ago

Hi @salaniz

I have the same problem as @mlching and I get normal values for CIDEr metric with the example from your repository.

Here is an example of what I implement :

from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.cider.cider import Cider

# scorers
scorers["bleu"] = Bleu(1)
scorers["cider"] = Cider()

# toy dataset for the example
reference = "The cat is black ."
prediction = "The cat is black ."

dict_reference = {'0': [reference]}
dict_prediction = {'0': [prediction]}

# compute BLEU score
scores, _ = scorers["bleu"].compute_score(dict_ref, dict_pre) # IT RETURNS 1.0

# compute CIDER score
scores, _ = scorers["cider"].compute_score(dict_ref, dict_pre) # IT RETURNS 0.0 

Thanks in advance for your help !

theodpzz commented 7 months ago

@salaniz Returns 0.0 if reference inputs are the same

The score is 0.0 with the first example because inputs are the same

from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.cider.cider import Cider

# scorers
scorers["bleu"] = Bleu(1)
scorers["cider"] = Cider()

# toy dataset for the example
reference1, reference2 = "the cat is black", "the cat is black"
prediction1, prediction2 = "the cat is black", "the eyes are green"

dict_reference = {391895: [reference1], 522418: [reference2]}
dict_prediction = {391895: [prediction1], 522418: [prediction2]}

# compute CIDER score
scores, _ = scorers["cider"].compute_score(dict_reference, dict_prediction) # IT RETURNS 0.0 
print(f'CIDEr: {scores}')

And the score from the code below is 10.0

from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.cider.cider import Cider

# scorers
scorers["bleu"] = Bleu(1)
scorers["cider"] = Cider()

# toy dataset for the example
reference1, reference2 = "the cat is black", "the eyes are green"
prediction1, prediction2 = "the cat is black", "the eyes are green"

dict_reference = {391895: [reference1], 522418: [reference2]}
dict_prediction = {391895: [prediction1], 522418: [prediction2]}

# compute CIDER score
scores, _ = scorers["cider"].compute_score(dict_reference, dict_prediction) # IT RETURNS 10.0 
print(f'CIDEr: {scores}')

But did not explore the code deeper so can't tell why

theophilegervet commented 7 months ago

Actually this metrics uses the IDF so it requires computing across the whole dataset at once