[BUG] GDPL cound not train.

sherlock1987 commented 4 years ago

Describe the bug When I try to train the model of GDPL, also I loaded the MLE pretrained model, but the loss and results for evluation is always around 0.26. Below is the problem issue, could you guys help me out? Since GDPL is pretty good, and also I plan to set this as my baseline model.

To Reproduce

Go to ploicy/gdpl/train.py and add the arguements --load_model path of MLE. And you could see the results, the loss will become bigger and bigger. This results should look like this:

WARNING:root:illegal booking slot: time, slot: hotel domain WARNING:root:illegal booking slot: time, slot: hotel domain WARNING:root:illegal booking slot: time, slot: hotel domain WARNING:root:illegal booking slot: time, slot: hotel domain WARNING:root:illegal booking slot: time, slot: taxi domain DEBUG:root:<> epoch 0, loss_real:-0.5383382267836068, loss_gen:-1.5583195904683735 INFO:root:<> epoch 0: saved network to mdl DEBUG:root:<> weight -3.7587242126464844 DEBUG:root:<> log pi -11.807324409484863 /home/raliegh/视频/convlab2_github_code_theirs/ConvLab-2/convlab2/policy/gdpl/gdpl.py:183: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_gradnorm. torch.nn.utils.clip_grad_norm(self.policy.parameters(), 10) DEBUG:root:<

liangrz15 commented 4 years ago

Hi, for this moment, the GDPL model has slight improvement over the pretrained MLE model at the beginning epochs. However, the performance will drop later. We will solve this problem as soon as possible.

sherlock1987 commented 4 years ago

Thanks Bro

sherlock1987 commented 4 years ago

Is there any clue? We could fix this problem together. I believe the reward estimator has some problems, since loss func is based on that extimator.

sherlock1987 commented 4 years ago

Hey, is anyone start looking at this?

liangrz15 commented 4 years ago

Hey, is anyone start looking at this?

Yes, I am working on it.

sherlock1987 commented 4 years ago

Cool!

zqwerty commented 4 years ago

move to #54

thu-coai / ConvLab-2

[BUG] GDPL cound not train. #20