[BUG] Agenda updates incorrectly cur_domain

ghost commented 3 years ago

In the Agenda of MultiWOZ user simulator, self.cur_domain is updated in the following lines

https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L772-L783

self.cur_domain is set by user's action at first, then updated by system' action. But in line 779, the condition means that self.cur_domain is updated by user's action only at the beginning, thereafter updated only by system's action. I think this doesn't make sense.

In contrast, MultiWOZ evaluator updates self.cur_domain by both user's and system's action in each turn.

https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/evaluator/multiwoz_eval.py#L115-L129

https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/evaluator/multiwoz_eval.py#L86-L113

zqwerty commented 3 years ago

thanks! we will take a look

ghost commented 3 years ago

Is there any progress?

aaa123git commented 3 years ago

Hi, @RaphaelCS. zqwerty and I have thought about your advice and submitted a PR.

We agree that self.cur_domain should be updated by user's action and system's action. However, the evaluation results don't change after updating code. I guess the reason is as follows.

There are mainly 2 steps in UserPolicyAgendaMultiWoz.predict.

step 1: call Agenda.update. https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L110 There are mainly 2 sub-steps in Agenda.update. step 1.1: update self.cur_domain by system's actions https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L444 step 1.2: update items in agenda https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L446-L468
step 2: get the output actions https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L118 There are also 2 sub-steps in Agenda.get_action step 2.1: pop user's actions https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L582-L587 step 2.2: (after updating code, ) update self.cur_domain by user's actions https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L589

Pay attention to the following facts.

self.cur_domain may affect step 1.2, but doesn't affect step 2.1
self.cur_domain indicates the domain of Booking action. Without self.cur_domain, user simulator doesn't know which domain the system is booking.
When sys_action only contains booking action, updating self.cur_domain by user's action makes sense. If sys_action contains domain information (e.g., sys_action = {"train-inform": ...}, self.cur_domain is overwritten by the newest domain in sys_action.
When the domains of sys_action and user_action are different, updating self.cur_domain by user's action makes sense.

We can infer that only if sys_action at previous turn is in domain A, and user action at previous turn is in domain B, and sys_action at current turn only contains booking action, does updating self.cur_domain by user's action make sense. It's really a rare condition.

ghost commented 3 years ago

Yes, the condition is really rare but that is exactly what happened to me!

Actually, when using my own policy to interact with the simulator, my policy consistently replies booking action (that's a fault of my policy) in some situations. Thus, the dialogue falls in a loop.

But what let me pay attention to self.cur_domain is the extremely large reward. Even though the total success rate isn't that high, the evaluated rewards are too much. It achieves nearly 170, which obviously doesn't make sense. Finally I found that self.cur_domain doesn't update correctly (just in the rare case), and the evaluator consistently give 5 as reward because my policy had successfully finished one domain.

Anyway, thanks for updating!

thu-coai / ConvLab-2

[BUG] Agenda updates incorrectly cur_domain #207