Open ShangwuYao opened 6 years ago
Seems a lot changes. You may use a smaller lambda to see if it can coverage better or not.
Thank you so much for helping. We also don't know whether you use the "freeze_mode" or not, is it actually helpful?
freeze_mode is never used.
我也遇见同样问题,在pytoch上只实现RHN,在PTB上数据集上,困惑度很大,配置和这个一样
@wlj6816 please use the older tensorflow versions as I specified.
@wlj6816 @ShangwuYao I am also trying to implement this method using pytorch, but I have encountered many bottlenecks. I would like to know if anyone has successfully done it? Hope to be able to communicate. Thank you very much.
Hi, we reproduce the algorithm (with dim_glasso and structure_glasso) on pytorch using almost the exact same parameters as yours, but end up getting huge perplexity (around 600 after 5 epochs and didn't get any better in further training). Could you provide some ideas on that?
We do have some difference in parameter choosing and input embedding. You initialize the weight parameters uniformly in [−0.04, 0.04], and we followed the pytorch documentation and set it to [-0.1, 0.1], and we found that your learning rate is actually different from the baseline model ("train the model for 55 (check) epochs with a learning rate of 1 ; after 14 epochs we start to reduce the learning rate by a factor of 1.15 after each epoch"), and the learning rate we are using is again different (again follow pytorch tutorial): start from 20, divide by 4 in anneal. And we use an encoder layer after the input and a decoder layer before the output.
Would the parameter initialization and updating make a big difference? Could you provide some insights? Thanks a lot