Use of Aggregated Belief

In the evaluation script evaluate_multiwoz.py at line 320, the use of aggregated_belief has been commented and instead beliefs is used. This makes pred_beliefs to be a list of dictionaries instead of a dictionary (as would be the case if one used aggregated_belief).

This makes the later check at line 332 fail and thus yielding empty venues for the dialogues. This significantly decreases the Matches and Success rates.

Making the change to aggregated_belief (and changing the lines 335 and 336 accordingly) significantly increases the Matches and Success rates.

What is the reason behind the choice of using pred_beliefs = dial['beliefs'] instead of pred_beliefs = dial['aggregated_belief']?

salesforce / simpletod

Use of Aggregated Belief #29