Open nikitos9000 opened 6 years ago
Hi @nsmetanin,
Thank you for your interest in our work.
Given the description of your experiments, we suspect that you ran the script ptb_search.sh
, and not the ptb_final.sh
. The script ptb_search.sh
only searches for the architecture, and the results are usually as you see (validation perplexity fluctuating largely between different epochs).
As we describe in Section 2 of the paper (the last paragraph), one has to run ptb_search.sh
until it finishes, then use the architecture with the highest reward to retrain from scratch. For instance, when ptb_search.sh
is finished, you should see some lines such as the following one
[0 0 0 1 1 2 1 1 1 2 1 2 1 2 1 6 0 4 1 9 1 9 1] rw=0.551
From these, you should pick the one with the highest rw
and feed it to ptb_final.sh
. You can also see an example in ptb_final.sh
, which will produce the 55.8
perplexity as we report in the paper.
@hyhieu Great, thanks, you're right! I ran only the ptb_search.sh
I've started the ptb_final training with the optimal architecture, hope this will work!
@nsmetanin Thanks! Please let us know how it goes 😃
Thanks a lot for the code!
I'm running ptb_final.sh
without any modification (i.e. using the provided RNN cell and hyperparameters). At epoch 1150, the training ppl is 11.83 and the validation ppl is 322.69. I could wait for longer but it looks like the model is overfitting quite a bit. My TF version is 1.6.0.
Then I checked the code and it seems that o_mask
for the variational dropout at the output layer has never been actually used (link). Could this be the possible cause?
Hi Hanxiao @quark0,
That's definitely a bug. Thank you for spotting it. We have pushed a commit that fixes it. We have tried rerunning the code and the output looks similar to the output that gave us 58.8
test perplexity. We will let the experiment finish and confirm with you whether the result is indeed the same.
Update on results: @quark0 We finished rerunning the script with the fix and indeed got the test perplexity of 56.6
.
Thanks, @hyhieu! I also got similar test-set ppl using the latest code. Interestingly, the corresponding validation ppl is around 67
. Is this also what you have observed in the experiments?
Yes that's what we got too. We think the reason is that the validation perplexity we computed using a batch_size
of 35
. If we use batch_size = 1
for validation, we get around the same number.
@hyhieu Cool, thanks for the clarifications!
@quark0 did you manage to get this performance after this commit https://github.com/melodyguan/enas/commit/2734eb2657847f090e1bc5c51c2b9cbf0be51887 They actually fixed the evaluation and I can't seem to get below 63 in ppl which would make more sense given the score in the validation set. Using the previous evaluation, i.e., total_loss += np.minimum(curr_loss, 10.0 bptt_steps batch_size) It achieves 55.6, but I've never witnessed such evaluation before.
Thanks, Best
@hyhieu I ran the ptb_final.sh without change,while I only got this
test_total_loss: 341452.08
test_log_ppl: 4.14
test_ppl: 62.95
I can see the fixed_acr is blow which is same with the example showing in the paper
child_fixed_arc....................0 0 0 1 1 2 1 2 0 2 0 5 1 1 0 6 1 8 1 8 1 8 1
Hi @melodyguan, thanks for the great paper!
But I still can't reproduce results from your paper in finding the RNN cell on PTB dataset.
After approx. 24 hours of training (~22 epochs), the best validation ppl is still 400 (and it's also very unstable, ranging from 1000 to 400 between epochs) and training ppl is around 250, which isn't even close to 55.8 as in your paper. The code and data were taken as is and not modified.
Prior to that, I also had similar problems with reproducing these results with https://github.com/carpedm20/ENAS-pytorch code.
Are there some problems with hyperparameters selection or maybe some bugs in a code?