ttanida / rgrg

Code for the CVPR paper "Interactive and Explainable Region-guided Radiology Report Generation"
MIT License
143 stars 24 forks source link

Warning: Empty candidate sentence detected; setting raw BERTscores to 0. #18

Open IQraQasim10 opened 1 year ago

IQraQasim10 commented 1 year ago

Hi, While running test_set_evaluation.py, I am getting the warning message Warning: Empty candidate sentence detected; setting raw BERTscores to 0. Do you recommend to continue generating reports and BERTscores? Here is the snippet of the warning message

image
IQraQasim10 commented 1 year ago
image

This is how to reports are generates in generated_reports.txt

ttanida commented 1 year ago

Hi,

ideally, you shouldn't get neither the Empty candidate sentence detected nor the Empty reference sentence detected warning.

Because in the first case, it means the model generated empty sentences (i.e. empty strings) for each region, which were concatenated into an empty report. I.e. the model is not trained well.

And in the second case, it means the reference report was empty, which shouldn't happen since we only used non-empty findings sections as reference reports. I.e. somehow there are empty reference reports in your test set.

Looking at your 2nd screenshot, it seems like the model you evaluated was not trained well (or maybe not trained enough yet). Usually, it's normal that it will generate gibberish in the beginning of the training, but it should generate coherent sentences towards the end of its training.

ahll19 commented 7 months ago

I get the same issue when trying to run the score method from the BERTScorer class.

I am using the "microsoft/deberta-xlarge-mnli" model, on two sentences generated by ChatGPT:

"Bohemian Rhapsody by Queen is an epic rock opera masterpiece that defies traditional song structures, blending ballad, opera, and hard rock elements into a compelling narrative about a young man grappling with the consequences of his actions."

"Bohemian Rhapsody is a genre-defying musical journey that weaves together operatic vocals, intricate harmonies, and dynamic instrumentation, exploring themes of introspection, rebellion, and acceptance."

Specifically I get this stacktrace

Warning: Empty candidate sentence detected; setting raw BERTscores to 0.
Warning: Empty reference sentence detected; setting raw BERTScores to 0.
Warning: Empty candidate sentence detected; setting raw BERTscores to 0.
Warning: Empty reference sentence detected; setting raw BERTScores to 0.
Warning: Empty candidate sentence detected; setting raw BERTscores to 0.
Warning: Empty reference sentence detected; setting raw BERTScores to 0.
Traceback (most recent call last):
  File "/home/anders/Git/P10/llm_tools/evaluation/evaluation2.py", line 71, in <module>
    P, R, F1 = backend.score(queen1, queen2)
  File "/home/anders/.pyenv/versions/P10/lib/python3.10/site-packages/bert_score/scorer.py", line 220, in score
    all_preds = bert_cos_score_idf(
  File "/home/anders/.pyenv/versions/P10/lib/python3.10/site-packages/bert_score/utils.py", line 659, in bert_cos_score_idf
    P, R, F1 = greedy_cos_idf(*ref_stats, *hyp_stats, all_layers)
  File "/home/anders/.pyenv/versions/P10/lib/python3.10/site-packages/bert_score/utils.py", line 517, in greedy_cos_idf
    sim = torch.bmm(hyp_embedding, ref_embedding.transpose(1, 2))
RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [50, 1024] but got: [10, 1024].

Has there been any work on this issue yet?

Addendum: when using the exact same framework, but calling the plot_example function it works.