mkusner / grammarVAE

Code for the "Grammar Variational Autoencoder" https://arxiv.org/abs/1703.01925
269 stars 78 forks source link

Can not replicate results #6

Closed jaredwillard12 closed 7 years ago

jaredwillard12 commented 7 years ago

Hi, recently cloned the repository and before I began editing the code, I attempted to replicate the results. The training accuracy builds to ~5% accuracy, then drops to ~1.9% accuracy and stays there until I kill the training. I can not seem to figure out why I do not receive the same accuracy as you do in the paper. Any assistance would be helpful. (On the zinc dataset)

mkusner commented 7 years ago

Hey Jared,

Is this the zinc model or the equations model? Also the accuracy you quote, is that the accuracy given by keras? The accuracy given by keras isn't very informative because it describes per-character accuracy instead of full accuracy (does the model reconstruct the full string or not). The way we compute full reconstruction accuracy for zinc is by starting with the trained model and a hold-out set of 5000 molecules. For each molecule we encode it 10 times (as encoding is stochastic) then for each encoding we decode it 100 times (as this is also stochastic). We then compute for the 5,000,000 decodings, how often they match the original molecules. Section D in the supplementary material of the paper describes this procedure in more detail.

In any case can you also let me know the versions of keras and tensorflow that you are using?

Thanks, Matt

On Tue, Aug 22, 2017 at 7:19 PM, Jared Willard notifications@github.com wrote:

Hi, recently cloned the repository and before I began editing the code, I attempted to replicate the results. The training accuracy builds to ~5% accuracy, then drops to ~1.9% accuracy and stays there until I kill the training. I can not seem to figure out why I do not receive the same accuracy as you do in the paper. Any assistance would be helpful. (On the zinc dataset)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mkusner/grammarVAE/issues/6, or mute the thread https://github.com/notifications/unsubscribe-auth/AIJS0RzMHkNim0GfWcVyg_ewjq-tTdpCks5saxvNgaJpZM4O_ACG .

jaredwillard12 commented 7 years ago

Hey Matt,

This is the zinc model I'm using. The accuracy I'm quoting is the categorical accuracy that keras gives, but when I run a reconstruction accuracy (like you describe) I get 0%. Even if we are only looking at categorical accuracy, returned by keras, wouldn't we expect that to be even higher?

Keras: 2.0.6 Tensorflow: 1.3.0

mkusner commented 7 years ago

Hmmm... How many epochs are you training for? Could you post the output of the train script?

On Aug 23, 2017 5:50 PM, "Jared Willard" notifications@github.com wrote:

Hey Matt,

This is the zinc model I'm using. The accuracy I'm quoting is the categorical accuracy that keras gives, but when I run a reconstruction accuracy (like you describe) I get 0%. Even if we are only looking at categorical accuracy, returned by keras, wouldn't we expect that to be even higher?

Keras: 2.0.6 Tensorflow: 1.3.0

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mkusner/grammarVAE/issues/6#issuecomment-324395947, or mute the thread https://github.com/notifications/unsubscribe-auth/AIJS0U9Xszstcqt0WjjEWe0HkU86TemRks5sbFhngaJpZM4O_ACG .

jaredwillard12 commented 7 years ago

I've trained for 100 and 1000 (for overkill) epochs in full (along with other little ones), which I have the output saved for, but it's long and doesn't look pretty (several hundred lines). The output also doesn't tell you much.

In general: The training and validation loss gets down to ~.18, and the accuracy at that point is about 4.73% (keras categorical accuracy), for the 100 epochs. The loss steadily drops throughout training. If I didn't have the accuracy printing out, I would think the network is training wonderfully (based on loss).

The reconstruction accuracy (like you described), I have saved is for the 100 epochs, and is 11% (best I've gotten). Typically the reconstruction is 0%. Unfortunately I don't have the weights saved for these runs anymore.

mkusner commented 7 years ago

Okay let me get back to you soon about this. Originally I wrote the code for Keras 1 and tensorflow 0.12 but this shouldn't account for the huge difference in results that you're seeing. I'm rerunning the code now that is updated to run on Keras 2 and tensorflow 1.

On Wed, Aug 23, 2017 at 6:48 PM, Jared Willard notifications@github.com wrote:

I've trained for 100 and 1000 (for overkill) epochs in full (along with other little ones), which I have the output saved for, but it's long and doesn't look pretty (several hundred lines). The output also doesn't tell you much.

In general: The training and validation loss gets down to ~.18, and the accuracy at that point is about 4.73% (keras categorical accuracy), for the 100 epochs. The loss steadily drops throughout training. If I didn't have the accuracy printing out, I would think the network is training wonderfully (based on loss).

The reconstruction accuracy (like you described), I have saved is for the 100 epochs, and is 11% (best I've gotten). Typically the reconstruction is 0%. Unfortunately I don't have the weights saved for these runs anymore.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mkusner/grammarVAE/issues/6#issuecomment-324411663, or mute the thread https://github.com/notifications/unsubscribe-auth/AIJS0aXUl3pp6TBjvFZUvmLUc2S8Wluzks5sbGYJgaJpZM4O_ACG .

jaredwillard12 commented 7 years ago

Perfect, thank you!

jaredwillard12 commented 7 years ago

So, it appears that my data file was corrupt. Strange issue, but I remade the files and on a test of 100 samples (instead of 5000) from the testing set, my reconstruction rate is now 67%. Can't say I know how that happened, but I'm glad it wasn't anything to do with the original code

Kfir-Schreiber commented 7 years ago

@jaredwillard12 thanks for the update. Just to make sure, do you still get low accuracy during training, but the post training reconstruction rate is 67%?

@mkusner can you please share the loss and accuracy values Keras returns for the trained model? I would like to compare it to the ones I'm getting using a modified model.

jaredwillard12 commented 7 years ago

Yes, during training, the accuracy is extremely low. I think it was hovering around 10% at the end. My loss was around 0.06. This is supposedly the categorical accuracy returned, but while I'm calculating the reconstruction accuracy, I calculated the categorical accuracy and received 92.61% (over those 100 samples). So I'm not really sure what keras is returning anymore...

mkusner commented 7 years ago

Thanks @jaredwillard12 for double-checking things and for getting those numbers for @Kfir-Schreiber!!

Yuanpengli commented 7 years ago

@jaredwillard12 I got the loss is 0.18, and categorical accuracy is 0.0530. I just want to know how could you get loss drop to 0.06, accuracy 0.10. Training more? Thank you.

jaredwillard12 commented 7 years ago

@Yuanpengli I trained for the entire 100 epochs, and it was what I received. I would say that with the model that you've trained, test the reconstruction rate, and see what that accuracy is. Also, how does your training loss compare to your validation loss? Did you notice your validation loss not improving at the end for a good amount of epochs?