hachmannlab / chemml

ChemML is a machine learning and informatics program suite for the chemical and materials sciences.
https://hachmannlab.github.io/chemml
BSD 3-Clause "New" or "Revised" License
162 stars 31 forks source link

chemML cant read XYZ format as string #8

Closed HenriqueCSJ closed 4 years ago

HenriqueCSJ commented 4 years ago

My molecules are stored in XYZ format in a pandas DataFrame and I'm trying to iterate them to chemml, It looks like I can not pass a string directly and I was with the impresison that this was possible.

Here is what I'm trying: My molecule (just a test):

Ethane = """8
    Energy:      -4.7343653
C          0.10289       -0.52365       -0.00000
C         -1.40917       -0.52366       -0.00000
H          0.48726        0.04224       -0.85384
H          0.48726       -1.54605       -0.06316
H          0.48726       -0.06716        0.91700
H         -1.79354       -0.98015       -0.91700
H         -1.79354        0.49874        0.06316
H         -1.79354       -1.08955        0.85384
"""

And here is the raised error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-71-ebbc3a0998bf> in <module>()
----> 1 mol = Molecule(Ethane, input_type="xyz")

/home/henrique/.local/lib/python3.6/site-packages/chemml/chem/molecule.py in __init__(self, input, input_type, **kwargs)
    275         self._init_attributes()
    276         self._extra_docs()
--> 277         self._load(input, input_type, **kwargs)
    278 
    279     def __repr__(self):

/home/henrique/.local/lib/python3.6/site-packages/chemml/chem/molecule.py in _load(self, input, input_type, **kwargs)
    422         """
    423         if input_type == 'xyz':
--> 424             self._load_pybel(input, input_type, **kwargs)
    425         elif input_type in ['smiles', 'smarts', 'inchi']:
    426             self._load_rdkit(input, input_type, **kwargs)

/home/henrique/.local/lib/python3.6/site-packages/chemml/chem/molecule.py in _load_pybel(self, input, input_type, **kwargs)
    495             else:
    496                 msg = "The input '%s' is not a valid XYZ input file."%input
--> 497                 raise ValueError(msg)
    498 
    499         if pybel_mol is None:

ValueError: The input '8
    Energy:      -4.7343653
C          0.10289       -0.52365       -0.00000
C         -1.40917       -0.52366       -0.00000
H          0.48726        0.04224       -0.85384
H          0.48726       -1.54605       -0.06316
H          0.48726       -0.06716        0.91700
H         -1.79354       -0.98015       -0.91700
H         -1.79354        0.49874        0.06316
H         -1.79354       -1.08955        0.85384
' is not a valid XYZ input file.

If, instead, I'm providing a real XYZ file instead of a string (with path end everything) it works.

aditya1707 commented 4 years ago

As mentioned in the documentation here, we have written the function to accept the path to xyz files and not the xyz as a string.

At the moment we are not looking to change this in the current version, so I guess you will have to use the files work-around.