openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
302 stars 88 forks source link

Should we add a Molecule.from_xyz() to allow easy perception of chemistry from dataset that lack chemical information? #1145

Open jchodera opened 2 years ago

jchodera commented 2 years ago

Is your feature request related to a problem? Please describe. I repeatedly hear from users that have erased the chemical information in their molecule (transforming into something like an XYZ or PDB file that only contains elements and 3D coordinates) that they would like a convenient way to create a Molecule to use OpenFF toolkit features, even if the heuristics for reconstructing chemical information are error-prone. It seems like we should support some path for doing this conveniently, but perhaps require an optional argument in the constructor or a special API point that requires the user to implicitly acknowledge this is a less desirable path to constructing a molecule.

Describe the solution you'd like We could support either a constructor or static method that enables file formats that lack chemical information to create a Molecule object, provided a special API point or optional argument is provided that acknowledges this approach is likely error-prone.

molecule = Molecule('xyzfile.xyz', use_unsafe_chemical_perception=True)
molecule = Molecule.from_xyz_unsafe(xyzfile)

Describe alternatives you've considered This problem comes up frequently, so having an explicit API for dealing with it, together with explicit documentation on why this is not the preferred route, is a better practice than the current state of things.

j-wags commented 2 years ago

Hm, this a big source of user confusion, so something like this for XYZ (and maybe some PDBs that have element info) could be nice.

On the other hand, this adds to our scope, and could lead us toward being responsible for differences between guessing heuristics in different packages.

If we do implement this, I recommend the second API point you suggested.

molecule = Molecule.from_xyz_unsafe(xyzfile)

I'd be happy to lay out more of a spec for this if someone wants to take it on. For now, since workarounds are available (and the workarounds indicate to the user that we're not responsible for the correctness of the guesses), I'll tag this as medium priority.

davidlmobley commented 2 years ago

Yes, at some level this is a cheminformatics problem and my/our preference for this has been that we depend on the cheminformatics engines for this since there are deep issues here we don't want to own.

I do like the idea of having "unsafe" in the function name if we ever do this. :) Or we could go more extreme and say "from_xyz_shootyourselfinthefoot".

lilyminium commented 2 years ago

molecule = Molecule.from_xyz_unsafe(xyzfile)

If you made the entrypoint from_unsafe_xyz_data or something that does not depend on it being a file, you could route from_qcschema through it too.

mattwthompson commented 2 years ago

FWIW in #1148 (should be live in v0.10.2) I added a stopgap that at least ensures users are provided a little more direction if attempting to read XYZ files with the current toolkit:

$ python
>>> from openff.toolkit.topology.molecule import Molecule
>>> Molecule.from_file("foo.xyz")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mwt/software/openforcefield/openff/toolkit/topology/molecule.py", line 4748, in from_file
    raise UnsupportedFileTypeError(
openff.toolkit.utils.exceptions.UnsupportedFileTypeError: Parsing `.xyz` files is not currently supported because 
they lack sufficient chemical information to be used with SMIRNOFF force fields. For more information, see 
https://open-forcefield-toolkit.readthedocs.io/en/latest/faq.html or to provide feedback please visit 
https://github.com/openforcefield/openff-toolkit/issues/1145.

The linked issue is this issue, so users may drop by to offer feedback / express desire that something like this be added here.

This would be updated if Molecule.from_xyz_if_you_accept_the_risks is implemented.