maxhodak / keras-molecules

Autoencoder network for learning a continuous representation of molecular structures.
MIT License
521 stars 146 forks source link

Add random seed of training for reproducibility #51

Open hsiaoyi0504 opened 7 years ago

hsiaoyi0504 commented 7 years ago

As title

hsiaoyi0504 commented 7 years ago

For issue #50

pechersky commented 7 years ago

Could you also change the generator-based approach? Specifically, there is a random.shuffle call here that can be seeded: https://github.com/maxhodak/keras-molecules/blob/master/molecules/vectorizer.py. Additionally, perhaps seed should be a flag that can be passed into train.py and train_gen.py.

hsiaoyi0504 commented 7 years ago

Sure, I will work on it.

hsiaoyi0504 commented 7 years ago

I don't think it's a good thing to do this (I mean add flag) , according to the comment here, by default Keras's model.compile() sets the shuffle argument as True. You should the set numpy seed before importing keras. Then, adding a flag would make code messy.

hsiaoyi0504 commented 7 years ago

Oh, I got it. However, I think it's two different things. Random seed for numpy and random seed for random.

hsiaoyi0504 commented 7 years ago

Already done (only one line code lol)

pechersky commented 7 years ago

I meant something like that SmilesDataGenerator.init could take a seed kwarg, which would be passed in at train_gen.py, using some sort of flag. In that case, it's ok to pass it in as acquired from a flag. Regarding train.py, you could move the keras import statements into the main function, with argparsing of whether there is a seed flag or not.

Let's also pull the requirements commit out of this PR, and I'll accept it in the other PR.

hsiaoyi0504 commented 7 years ago

Is it ok?

pechersky commented 7 years ago

I've committed a couple changes to train and train_gen to take the seed as a cli parameter. Could you test that they work as you expect?

hsiaoyi0504 commented 7 years ago

I test using tensorflow backend, and I found it doesn't work. After a quick search of this, it seems still open issue. Besides, I failed to execute using theano backend. I am still checking what happened.

pechersky commented 7 years ago

Do you at least get consistent shuffling? Is it just the weight initialization that is not seeded?

On Fri, Dec 23, 2016 at 9:21 AM, hsiao yi notifications@github.com wrote:

I test using tensorflow backend, and I found it doesn't work. After a quick search of this, it seems still open issue https://github.com/fchollet/keras/issues/2280. Besides, I failed to execute using theano backend. I am still checking what happened.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maxhodak/keras-molecules/pull/51#issuecomment-268996757, or mute the thread https://github.com/notifications/unsubscribe-auth/AFGDhjBjitQKZOHcEO4NUhnAhsVSS7-pks5rK9jSgaJpZM4LSxtK .

hsiaoyi0504 commented 7 years ago

It looks like the train.py works fine using theano backend, so I think we can wait the keras update in the future. I will now start checking the train_gen.py part.

hsiaoyi0504 commented 7 years ago

I am still unable to execute the train_gen.py. What is the input of this file?

hsiaoyi0504 commented 7 years ago

Oh, I find that I used wrong input file. The train_gen.py doesn't need preprocessing.py.

pylang commented 7 years ago

#2743 is still an issue for me, despite all the suggestions. Restarting the notebook gives different results as well.

keras 2.0.2 numpy 1.11.2 tensorflow 1.0.0

hsiaoyi0504 commented 7 years ago

It turns out that it is due to fchollet/keras#2280. I gave up trying these things long time ago. It seems that it can't be fixed in an easy manner.

pylang commented 7 years ago

Thanks for the information. Non-reproducible results is a serious issue in keras imo.