mkusner / grammarVAE

Code for the "Grammar Variational Autoencoder" https://arxiv.org/abs/1703.01925
269 stars 78 forks source link

Low accuracy #8

Closed alexander-turner closed 6 years ago

alexander-turner commented 6 years ago

I'm getting very low training performance on train_zinc.py, even using the pretrained model and after having regenerated the data multiple times. The accuracy starts at about .15% and trends down to .05%, while loss goes from 2.1 to 1.7.

Strangely, BO metrics seem almost unaffected.

mkusner commented 6 years ago

Hmmm.. Have you seen issue #6 ? Maybe some of the things here are things you're encountering?

One thing is if you try to retrain a pretrained model the large learning rate will likely destroy the current parameters, so you'd want to have a much lower initial learning rate.

alexander-turner commented 6 years ago

I tried lowering Adam's default learning rate to 1/100th the default; pretrained starts at 2% accuracy and trends downwards quickly, reaching .73% accuracy after 17 epochs. I could check the reconstruction accuracy (is there a script for this I'm not seeing?), but the loss is also pretty terrible right now, so I doubt it's secretly performing well.

mkusner commented 6 years ago

I've had an issue with the reported accuracy during training being really low. But then when I evaluate the reconstruction accuracy I get the result in the paper. Perhaps @MustafaMustafa could share his code from #9?

MustafaMustafa commented 6 years ago

Hi @mkusner and @alexander-turner, I can reproduce the paper results with any of the following settings:

1) adam_lr = 5e-4, batchsize = 1000 for 100 epochs (two GPUs) 2) adam_lr = 2e-4 decayed to 1e-4 after roughly 34 epochs, batchsize = 50 for 100 epochs 3) change the latent vector size to 64, the GRU units to 512 and the encoder dense layers to 512 units, use adam_lr = 5e-4, batchsize = 500 for 100 epochs

The best results were from the last two settings. The second is very slow due to the small batch size though. They all give accuracy in the +50% range (accuracy as defined in the paper appendix, not Keras bit by bit accuracy).

alexander-turner commented 6 years ago

@MustafaMustafa Thanks! Do you happen to have the code available for reconstruction accuracy?

mkusner commented 6 years ago

I assume this has been figured out. If not I will open this again!