thu-coai / ConvLab-2

ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems
Apache License 2.0
455 stars 133 forks source link

[BUG] Agenda updates incorrectly cur_domain #207

Closed ghost closed 3 years ago

ghost commented 3 years ago

In the Agenda of MultiWOZ user simulator, self.cur_domain is updated in the following lines

https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L772-L783

self.cur_domain is set by user's action at first, then updated by system' action. But in line 779, the condition means that self.cur_domain is updated by user's action only at the beginning, thereafter updated only by system's action. I think this doesn't make sense.

In contrast, MultiWOZ evaluator updates self.cur_domain by both user's and system's action in each turn.

https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/evaluator/multiwoz_eval.py#L115-L129

https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/evaluator/multiwoz_eval.py#L86-L113

zqwerty commented 3 years ago

thanks! we will take a look

ghost commented 3 years ago

Is there any progress?

aaa123git commented 3 years ago

Hi, @RaphaelCS. zqwerty and I have thought about your advice and submitted a PR.

We agree that self.cur_domain should be updated by user's action and system's action. However, the evaluation results don't change after updating code. I guess the reason is as follows.

There are mainly 2 steps in UserPolicyAgendaMultiWoz.predict.

Pay attention to the following facts.

We can infer that only if sys_action at previous turn is in domain A, and user action at previous turn is in domain B, and sys_action at current turn only contains booking action, does updating self.cur_domain by user's action make sense. It's really a rare condition.

ghost commented 3 years ago

Yes, the condition is really rare but that is exactly what happened to me!

Actually, when using my own policy to interact with the simulator, my policy consistently replies booking action (that's a fault of my policy) in some situations. Thus, the dialogue falls in a loop.

But what let me pay attention to self.cur_domain is the extremely large reward. Even though the total success rate isn't that high, the evaluated rewards are too much. It achieves nearly 170, which obviously doesn't make sense. Finally I found that self.cur_domain doesn't update correctly (just in the rare case), and the evaluator consistently give 5 as reward because my policy had successfully finished one domain.

Anyway, thanks for updating!