In the evaluation script evaluate_multiwoz.py at line 320, the use of aggregated_belief has been commented and instead beliefs is used. This makes pred_beliefs to be a list of dictionaries instead of a dictionary (as would be the case if one used aggregated_belief).
This makes the later check at line 332 fail and thus yielding empty venues for the dialogues. This significantly decreases the Matches and Success rates.
Making the change to aggregated_belief (and changing the lines 335 and 336 accordingly) significantly increases the Matches and Success rates.
What is the reason behind the choice of using pred_beliefs = dial['beliefs'] instead of pred_beliefs = dial['aggregated_belief']?
In the evaluation script
evaluate_multiwoz.py
at line 320, the use ofaggregated_belief
has been commented and insteadbeliefs
is used. This makespred_beliefs
to be a list of dictionaries instead of a dictionary (as would be the case if one usedaggregated_belief
).This makes the later check at line 332 fail and thus yielding empty venues for the dialogues. This significantly decreases the Matches and Success rates.
Making the change to
aggregated_belief
(and changing the lines 335 and 336 accordingly) significantly increases the Matches and Success rates.What is the reason behind the choice of using
pred_beliefs = dial['beliefs']
instead ofpred_beliefs = dial['aggregated_belief']
?