maxhodak / keras-molecules

Autoencoder network for learning a continuous representation of molecular structures.
MIT License
519 stars 146 forks source link

KeyError: "Unable to open object (Object 'latent_vectors' doesn't exist)" #57

Closed spadavec closed 6 years ago

spadavec commented 7 years ago

I'm trying to mess around with some custom datasets, and keep getting the following error when trying ot read_latent_data:

  File "ava.py", line 165, in <module>
    main()
  File "ava.py", line 127, in main
    data, charset = read_encoded_vecs(data_path)
  File "ava.py", line 63, in read_encoded_vecs
    data, charset = read_latent_data(data_path)
  File "/home/lerche/word_embedding_molecules/ava/keras_molecules/sample.py", line 34, in read_latent_data
    data = h5f['latent_vectors'][:]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)
  File "/home/lerche/miniconda2/lib/python2.7/site-packages/h5py/_hl/group.py", line 166, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/tmp/pip-4rPeHA-build/h5py/h5o.c:3570)
KeyError: "Unable to open object (Object 'latent_vectors' doesn't exist)"

My process for creating the input is: 1) create a .txt file that has 2 columns, 1 for SMILES and 1 for MoleculeID (~800k total) 2) Convert the .txt to h5 using the following:

import import_smiles
import sys
i_file = import_smiles.read_smiles(sys.argv[1], column=1)
import_smiles.create_h5(d, 'output.h5')

3) Preprocess the above h5 file using python preprocess.py output.h5 preprocessed.h5 4) run python train processed.h5 model.h5 --epochs 20

Then, if I try to run the following code, the error above is displayed:

from keras_molecules.sample import read_latent_data

data, charset = read_latent_data(data_path)

Interestingly, I can still run python sample.py on the data and model files; any idea where I'm going wrong?

EDIT: I should note that it looks like the keys that exist in processed.h5 file are:

[u'charset', u'data_test', u'data_train']

EDIT2: I'm getting this same behavior with the 50k data/model files that are 'included' as examples.