jensengroup / xyz2mol

Converts an xyz file to an RDKit mol object
MIT License
250 stars 70 forks source link

Parsing nciatlas geometries #31

Open ljmartin opened 2 years ago

ljmartin commented 2 years ago

Hi all, and thanks for providing xyz2mol.

Is it possible to parse multi-molecule xyz files? For example, the NCIAtlas provides geometries of pairs of molecules in xyz format. Invoking xyz2mol from the command fails while reading the charge line - it seems to be because there are two charges, charge_a and charge_b. Is that nonstandard? Example here:

7.113_noHB__water--methylisocyanide_200.xyz:

9
charge=0 charge_a=0 charge_b=0 selection_a=1-3 selection_b=4-9 scaling=2.0
  O    2.117646266  -0.063971009   0.000000000
  H    1.566706960   0.725029523   0.000000000
  H    3.022595991   0.254923981   0.000000000
  H   -3.957827806  -1.581005514  -0.887150416
  H   -3.957827806  -1.581005514   0.887150416
  H   -2.416049485  -1.548657916   0.000000000
  C   -3.430031556   1.371722631   0.000000000
  N   -3.465218185   0.209914627   0.000000000
  C   -3.449363210  -1.210254769   0.000000000

xyz2mol.py 7.113_noHB__water--methylisocyanide_200.xyz returns:

Traceback (most recent call last):
  File "/Users/ljmartin/miniconda3/envs/compchem/lib/python3.9/site-packages/xyz2mol.py", line 795, in <module>
    atoms, charge, xyz_coordinates = read_xyz_file(filename)
  File "/Users/ljmartin/miniconda3/envs/compchem/lib/python3.9/site-packages/xyz2mol.py", line 548, in read_xyz_file
    charge = int(line.split("=")[1])
ValueError: invalid literal for int() with base 10: '0 charge_a'

Thanks a lot!

jhjensen2 commented 2 years ago

Yeah, xyz2mol wants charge=0 on that line and nothing else. If you can come up with some fancy regex expression that correctly extracts the charge, I'd be happy to look at it