Closed ghost closed 3 years ago
thanks! we will take a look
Is there any progress?
Hi, @RaphaelCS. zqwerty and I have thought about your advice and submitted a PR.
We agree that self.cur_domain
should be updated by user's action and system's action. However, the evaluation results don't change after updating code. I guess the reason is as follows.
There are mainly 2 steps in UserPolicyAgendaMultiWoz.predict
.
Agenda.update
.
https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L110
There are mainly 2 sub-steps in Agenda.update
.
step 1.1: update self.cur_domain
by system's actions
https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L444
step 1.2: update items in agenda
https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L446-L468Agenda.get_action
step 2.1: pop user's actions
https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L582-L587
step 2.2: (after updating code, ) update self.cur_domain
by user's actions
https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L589Pay attention to the following facts.
self.cur_domain
may affect step 1.2, but doesn't affect step 2.1self.cur_domain
indicates the domain of Booking
action. Without self.cur_domain
, user simulator doesn't know which domain the system is booking.sys_action
only contains booking
action, updating self.cur_domain
by user's action makes sense. If sys_action
contains domain information (e.g., sys_action
= {"train-inform": ...}, self.cur_domain
is overwritten by the newest domain in sys_action
.sys_action
and user_action
are different, updating self.cur_domain
by user's action makes sense.We can infer that only if sys_action
at previous turn is in domain A, and user action
at previous turn is in domain B, and sys_action
at current turn only contains booking
action, does updating self.cur_domain
by user's action make sense. It's really a rare condition.
Yes, the condition is really rare but that is exactly what happened to me!
Actually, when using my own policy to interact with the simulator, my policy consistently replies booking
action (that's a fault of my policy) in some situations. Thus, the dialogue falls in a loop.
But what let me pay attention to self.cur_domain
is the extremely large reward.
Even though the total success rate isn't that high, the evaluated rewards are too much. It achieves nearly 170, which obviously doesn't make sense. Finally I found that self.cur_domain
doesn't update correctly (just in the rare case), and the evaluator consistently give 5 as reward because my policy had successfully finished one domain.
Anyway, thanks for updating!
In the
Agenda
of MultiWOZ user simulator,self.cur_domain
is updated in the following lineshttps://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/policy/rule/multiwoz/policy_agenda_multiwoz.py#L772-L783
self.cur_domain
is set by user's action at first, then updated by system' action. But in line 779, the condition means thatself.cur_domain
is updated by user's action only at the beginning, thereafter updated only by system's action. I think this doesn't make sense.In contrast, MultiWOZ evaluator updates
self.cur_domain
by both user's and system's action in each turn.https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/evaluator/multiwoz_eval.py#L115-L129
https://github.com/thu-coai/ConvLab-2/blob/aedd10659765f575b0d0a1cc767054488bc13567/convlab2/evaluator/multiwoz_eval.py#L86-L113