macournoyer / neuralconvo

Neural conversational model in Torch
776 stars 347 forks source link

E #41

Closed dimeldo closed 8 years ago

macournoyer commented 8 years ago

Very interesting! Thanks for sharing.

macournoyer commented 8 years ago

I'm closing this. But thanks again for the link! I've read the paper multiple times and this is definitely something I want to implement soon.

Based on the paper, the network must be trained first w/ something like we have here (this repo) and then RL is used to fine-tuned for long term rewards.

nabihach commented 7 years ago

I tried to implement this using dpnn's ReinforceCategorical module and the VRClassReward criterion. Had some trouble setting up the parameters and inputs/outputs of Seq2Seq layers correctly. If somebody has done this successfully, I'd love to see their code. Else, if somebody simply wants to collaborate, let me know! Thanks.