Open jerryhluo opened 6 years ago
Found a part of the reason: Python 2 generates "charset" variable in the same order (A->Z), while Python 3 is completely random. See https://stackoverflow.com/questions/9792664/set-changes-element-order
In additional, charset for 500k SMILES varies at different runs (due to the sampling function in preprocess.py). It's important for users to keep using the same set of files.
May you please provide the charset used to generate the pre-trained model? Since the model dimension also depends on the charset... @maxhodak
Updated on 4/23/2018 Solution found at https://github.com/chembl/autoencoder_ipython
Dear author,
I download your codes and pre-trained model (model_500k.h5) and tried out the following commands: python preprocess.py data/smiles_500k.h5 data/processed_500k.h5 python sample.py data/processed_500k.h5 data/model_500k.h5 --target autoencoder
Then it outputs:
NC(=O)c1nc(cnc1N)c2ccc(Cl)c(c2)S(=O)(=O)Nc3cccc(Cl)c3
(-> encoder -> decoder ->)
7-7ASC-F@@7N7AAAAAAAAAAAAAlllllNAACAAC7lll7AlllAAACC%CLA-VVVVVVVVFF--lAAAAAAAAAAAAAAVVAAAAACCAACCAAACAAACCA77A-VVV--
I am not sure what happened to the pre-trained model, seems it does not do a good job at all... Do you see a similar problem or I did something wrong...?