very big perplexity - Githubissues

ShangwuYao commented 6 years ago

Hi, we reproduce the algorithm (with dim_glasso and structure_glasso) on pytorch using almost the exact same parameters as yours, but end up getting huge perplexity (around 600 after 5 epochs and didn't get any better in further training). Could you provide some ideas on that?

We do have some difference in parameter choosing and input embedding. You initialize the weight parameters uniformly in [−0.04, 0.04], and we followed the pytorch documentation and set it to [-0.1, 0.1], and we found that your learning rate is actually different from the baseline model ("train the model for 55 (check) epochs with a learning rate of 1 ; after 14 epochs we start to reduce the learning rate by a factor of 1.15 after each epoch"), and the learning rate we are using is again different (again follow pytorch tutorial): start from 20, divide by 4 in anneal. And we use an encoder layer after the input and a decoder layer before the output.

Would the parameter initialization and updating make a big difference? Could you provide some insights? Thanks a lot

wenwei202 commented 6 years ago

Seems a lot changes. You may use a smaller lambda to see if it can coverage better or not.

ShangwuYao commented 6 years ago

Thank you so much for helping. We also don't know whether you use the "freeze_mode" or not, is it actually helpful?

wenwei202 commented 6 years ago

freeze_mode is never used.

wlj6816 commented 6 years ago

我也遇见同样问题，在pytoch上只实现RHN，在PTB上数据集上，困惑度很大，配置和这个一样

wenwei202 commented 6 years ago

@wlj6816 please use the older tensorflow versions as I specified.

M123040025 commented 8 months ago

@wlj6816 @ShangwuYao I am also trying to implement this method using pytorch, but I have encountered many bottlenecks. I would like to know if anyone has successfully done it? Hope to be able to communicate. Thank you very much.

wenwei202 / iss-rnns

very big perplexity #6