Reproducibility of the results from the paper (RNN)

melodyguan / enas

TensorFlow Code for paper "Efficient Neural Architecture Search via Parameter Sharing"

https://arxiv.org/abs/1802.03268

Apache License 2.0

1.58k stars 390 forks source link

Reproducibility of the results from the paper (RNN) #6

Open nikitos9000 opened 6 years ago

nikitos9000 commented 6 years ago

Hi @melodyguan, thanks for the great paper!

But I still can't reproduce results from your paper in finding the RNN cell on PTB dataset.

After approx. 24 hours of training (~22 epochs), the best validation ppl is still 400 (and it's also very unstable, ranging from 1000 to 400 between epochs) and training ppl is around 250, which isn't even close to 55.8 as in your paper. The code and data were taken as is and not modified.

Prior to that, I also had similar problems with reproducing these results with https://github.com/carpedm20/ENAS-pytorch code.

Are there some problems with hyperparameters selection or maybe some bugs in a code?

hyhieu commented 6 years ago

Hi @nsmetanin,

Thank you for your interest in our work.

Given the description of your experiments, we suspect that you ran the script ptb_search.sh, and not the ptb_final.sh. The script ptb_search.sh only searches for the architecture, and the results are usually as you see (validation perplexity fluctuating largely between different epochs).

As we describe in Section 2 of the paper (the last paragraph), one has to run ptb_search.sh until it finishes, then use the architecture with the highest reward to retrain from scratch. For instance, when ptb_search.sh is finished, you should see some lines such as the following one

[0 0 0 1 1 2 1 1 1 2 1 2 1 2 1 6 0 4 1 9 1 9 1] rw=0.551

From these, you should pick the one with the highest rw and feed it to ptb_final.sh. You can also see an example in ptb_final.sh, which will produce the 55.8 perplexity as we report in the paper.

nikitos9000 commented 6 years ago

@hyhieu Great, thanks, you're right! I ran only the ptb_search.sh

I've started the ptb_final training with the optimal architecture, hope this will work!

hyhieu commented 6 years ago

@nsmetanin Thanks! Please let us know how it goes 😃

quark0 commented 6 years ago

Thanks a lot for the code!

I'm running ptb_final.sh without any modification (i.e. using the provided RNN cell and hyperparameters). At epoch 1150, the training ppl is 11.83 and the validation ppl is 322.69. I could wait for longer but it looks like the model is overfitting quite a bit. My TF version is 1.6.0.

Then I checked the code and it seems that o_mask for the variational dropout at the output layer has never been actually used (link). Could this be the possible cause?

hyhieu commented 6 years ago

Hi Hanxiao @quark0,

That's definitely a bug. Thank you for spotting it. We have pushed a commit that fixes it. We have tried rerunning the code and the output looks similar to the output that gave us 58.8 test perplexity. We will let the experiment finish and confirm with you whether the result is indeed the same.

hyhieu commented 6 years ago

Update on results: @quark0 We finished rerunning the script with the fix and indeed got the test perplexity of 56.6.

quark0 commented 6 years ago

Thanks, @hyhieu! I also got similar test-set ppl using the latest code. Interestingly, the corresponding validation ppl is around 67. Is this also what you have observed in the experiments?

hyhieu commented 6 years ago

Yes that's what we got too. We think the reason is that the validation perplexity we computed using a batch_size of 35. If we use batch_size = 1 for validation, we get around the same number.

quark0 commented 6 years ago

@hyhieu Cool, thanks for the clarifications!

Wronskia commented 6 years ago

@quark0 did you manage to get this performance after this commit https://github.com/melodyguan/enas/commit/2734eb2657847f090e1bc5c51c2b9cbf0be51887 They actually fixed the evaluation and I can't seem to get below 63 in ppl which would make more sense given the score in the validation set. Using the previous evaluation, i.e., total_loss += np.minimum(curr_loss, 10.0 bptt_steps batch_size) It achieves 55.6, but I've never witnessed such evaluation before.

Thanks, Best

yogurfrul commented 6 years ago

@hyhieu I ran the ptb_final.sh without change,while I only got this

test_total_loss: 341452.08
test_log_ppl: 4.14
test_ppl: 62.95

I can see the fixed_acr is blow which is same with the example showing in the paper

child_fixed_arc....................0 0 0 1 1 2 1 2 0 2 0 5 1 1 0 6 1 8 1 8 1 8 1