Hi, thanks for the code. It's amazing. I have one question.
In the paper, you mentioned that you have two strategies for computing intermediate step rewards.
The strategy one is the Monte Carlo search, which is same as the SeqGAN. I can find this implementation in your code.
For the strategy two, you train a discriminator that is able to assign rewards to partially decoded sequences (you claim this is not as good as the MC model). Did you upload this part of the code? I can't find it in the implementation (should be in the `discriminator' folder?) Or I miss something?
Hi, thanks for the code. It's amazing. I have one question. In the paper, you mentioned that you have two strategies for computing intermediate step rewards. The strategy one is the Monte Carlo search, which is same as the SeqGAN. I can find this implementation in your code. For the strategy two, you train a discriminator that is able to assign rewards to partially decoded sequences (you claim this is not as good as the MC model). Did you upload this part of the code? I can't find it in the implementation (should be in the `discriminator' folder?) Or I miss something?
Thanks!