Open IQraQasim10 opened 1 year ago
This is how to reports are generates in generated_reports.txt
Hi,
ideally, you shouldn't get neither the Empty candidate sentence detected
nor the Empty reference sentence detected
warning.
Because in the first case, it means the model generated empty sentences (i.e. empty strings) for each region, which were concatenated into an empty report. I.e. the model is not trained well.
And in the second case, it means the reference report was empty, which shouldn't happen since we only used non-empty findings sections as reference reports. I.e. somehow there are empty reference reports in your test set.
Looking at your 2nd screenshot, it seems like the model you evaluated was not trained well (or maybe not trained enough yet). Usually, it's normal that it will generate gibberish in the beginning of the training, but it should generate coherent sentences towards the end of its training.
I get the same issue when trying to run the score
method from the BERTScorer
class.
I am using the "microsoft/deberta-xlarge-mnli"
model, on two sentences generated by ChatGPT:
"Bohemian Rhapsody by Queen is an epic rock opera masterpiece that defies traditional song structures, blending ballad, opera, and hard rock elements into a compelling narrative about a young man grappling with the consequences of his actions."
"Bohemian Rhapsody is a genre-defying musical journey that weaves together operatic vocals, intricate harmonies, and dynamic instrumentation, exploring themes of introspection, rebellion, and acceptance."
Specifically I get this stacktrace
Warning: Empty candidate sentence detected; setting raw BERTscores to 0.
Warning: Empty reference sentence detected; setting raw BERTScores to 0.
Warning: Empty candidate sentence detected; setting raw BERTscores to 0.
Warning: Empty reference sentence detected; setting raw BERTScores to 0.
Warning: Empty candidate sentence detected; setting raw BERTscores to 0.
Warning: Empty reference sentence detected; setting raw BERTScores to 0.
Traceback (most recent call last):
File "/home/anders/Git/P10/llm_tools/evaluation/evaluation2.py", line 71, in <module>
P, R, F1 = backend.score(queen1, queen2)
File "/home/anders/.pyenv/versions/P10/lib/python3.10/site-packages/bert_score/scorer.py", line 220, in score
all_preds = bert_cos_score_idf(
File "/home/anders/.pyenv/versions/P10/lib/python3.10/site-packages/bert_score/utils.py", line 659, in bert_cos_score_idf
P, R, F1 = greedy_cos_idf(*ref_stats, *hyp_stats, all_layers)
File "/home/anders/.pyenv/versions/P10/lib/python3.10/site-packages/bert_score/utils.py", line 517, in greedy_cos_idf
sim = torch.bmm(hyp_embedding, ref_embedding.transpose(1, 2))
RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [50, 1024] but got: [10, 1024].
Has there been any work on this issue yet?
Addendum: when using the exact same framework, but calling the plot_example
function it works.
Hi, While running test_set_evaluation.py, I am getting the warning message
Warning: Empty candidate sentence detected; setting raw BERTscores to 0.
Do you recommend to continue generating reports and BERTscores? Here is the snippet of the warning message