grimme-lab / mctc-lib

Modular computation tool chain library
https://grimme-lab.github.io/mctc-lib
Apache License 2.0
15 stars 17 forks source link

Cannot read Maestro SDF format #49

Closed awvwgk closed 2 years ago

awvwgk commented 2 years ago

The problem

SDF and molfiles generated by the Maestro suite don't follow the connection table specification. The main difference is that they use less entries for each record in the V2K format (6 columns instead of the required 12 for coordinates and 3 columns instead of the required 4 for bonds).

This results in the following error when reading a Maestro generated SDF or molfile:

Error: Cannot read coordinates from connection table
 --> aspirin3d_maestro.mol:5:52-54
  |
5 |     1.2333    0.5540    0.7792 O   0  0  0  0  0  0
  |                                                    ^^^ unexpected value
  |

Note that the error message here could be clearer stating that we expect more values.


The solution

The best fix would be to allow the format extension by Maestro as valid connection table format. In case we exhaust the columns we just assume zeros were provided and not raise a syntax error in such a case.

The actual implementation is present in src/mctc/io/read/ctfile.f90:

Entries for the coordinates are read at (here no entry is really required if we assume zero)

https://github.com/grimme-lab/mctc-lib/blob/f02b5905d6c80fc441ec7bcf150f9a6f46fa6d3f/src/mctc/io/read/ctfile.f90#L199-L204

Entries for the bonds are read at (here we need the first three entries, atom indices + bond order)

https://github.com/grimme-lab/mctc-lib/blob/f02b5905d6c80fc441ec7bcf150f9a6f46fa6d3f/src/mctc/io/read/ctfile.f90#L227-L232

The entries, if present, should still be parsed to avoid overlooking malformatted input files.


Reproducer for regression testing

valid aspirin3d.mol ``` 2244 -OEChem-02042111203D 21 21 0 0 0 0 0 0 0999 V2000 1.2333 0.5540 0.7792 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.6952 -2.7148 -0.7502 O 0 0 0 0 0 0 0 0 0 0 0 0 0.7958 -2.1843 0.8685 O 0 0 0 0 0 0 0 0 0 0 0 0 1.7813 0.8105 -1.4821 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.0857 0.6088 0.4403 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.7927 -0.5515 0.1244 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.7288 1.8464 0.4133 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.1426 -0.4741 -0.2184 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.0787 1.9238 0.0706 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.7855 0.7636 -0.2453 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.1409 -1.8536 0.1477 C 0 0 0 0 0 0 0 0 0 0 0 0 2.1094 0.6715 -0.3113 C 0 0 0 0 0 0 0 0 0 0 0 0 3.5305 0.5996 0.1635 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.1851 2.7545 0.6593 H 0 0 0 0 0 0 0 0 0 0 0 0 -2.7247 -1.3605 -0.4564 H 0 0 0 0 0 0 0 0 0 0 0 0 -2.5797 2.8872 0.0506 H 0 0 0 0 0 0 0 0 0 0 0 0 -3.8374 0.8238 -0.5090 H 0 0 0 0 0 0 0 0 0 0 0 0 3.7290 1.4184 0.8593 H 0 0 0 0 0 0 0 0 0 0 0 0 4.2045 0.6969 -0.6924 H 0 0 0 0 0 0 0 0 0 0 0 0 3.7105 -0.3659 0.6426 H 0 0 0 0 0 0 0 0 0 0 0 0 -0.2555 -3.5916 -0.7337 H 0 0 0 0 0 0 0 0 0 0 0 0 1 5 1 0 0 0 0 1 12 1 0 0 0 0 2 11 1 0 0 0 0 2 21 1 0 0 0 0 3 11 2 0 0 0 0 4 12 2 0 0 0 0 5 6 1 0 0 0 0 5 7 2 0 0 0 0 6 8 2 0 0 0 0 6 11 1 0 0 0 0 7 9 1 0 0 0 0 7 14 1 0 0 0 0 8 10 1 0 0 0 0 8 15 1 0 0 0 0 9 10 2 0 0 0 0 9 16 1 0 0 0 0 10 17 1 0 0 0 0 12 13 1 0 0 0 0 13 18 1 0 0 0 0 13 19 1 0 0 0 0 13 20 1 0 0 0 0 M END ```
“incorrect” aspirin3d_maestro.mol ``` 2244 3D Schrodinger Suite 2022-1. 21 21 0 0 1 0 999 V2000 1.2333 0.5540 0.7792 O 0 0 0 0 0 0 -0.6952 -2.7148 -0.7502 O 0 0 0 0 0 0 0.7958 -2.1843 0.8685 O 0 0 0 0 0 0 1.7813 0.8105 -1.4821 O 0 0 0 0 0 0 -0.0857 0.6088 0.4403 C 0 0 0 0 0 0 -0.7927 -0.5515 0.1244 C 0 0 0 0 0 0 -0.7288 1.8464 0.4133 C 0 0 0 0 0 0 -2.1426 -0.4741 -0.2184 C 0 0 0 0 0 0 -2.0787 1.9238 0.0706 C 0 0 0 0 0 0 -2.7855 0.7636 -0.2453 C 0 0 0 0 0 0 -0.1409 -1.8536 0.1477 C 0 0 0 0 0 0 2.1094 0.6715 -0.3113 C 0 0 0 0 0 0 3.5305 0.5996 0.1635 C 0 0 0 0 0 0 -0.1851 2.7545 0.6593 H 0 0 0 0 0 0 -2.7247 -1.3605 -0.4564 H 0 0 0 0 0 0 -2.5797 2.8872 0.0506 H 0 0 0 0 0 0 -3.8374 0.8238 -0.5090 H 0 0 0 0 0 0 3.7290 1.4184 0.8593 H 0 0 0 0 0 0 4.2045 0.6969 -0.6924 H 0 0 0 0 0 0 3.7105 -0.3659 0.6426 H 0 0 0 0 0 0 -0.2555 -3.5916 -0.7337 H 0 0 0 0 0 0 1 5 1 0 0 0 1 12 1 0 0 0 2 11 1 0 0 0 2 21 1 0 0 0 3 11 2 0 0 0 4 12 2 0 0 0 5 6 1 0 0 0 5 7 2 0 0 0 6 8 2 0 0 0 6 11 1 0 0 0 7 9 1 0 0 0 7 14 1 0 0 0 8 10 1 0 0 0 8 15 1 0 0 0 9 10 2 0 0 0 9 16 1 0 0 0 10 17 1 0 0 0 12 13 1 0 0 0 13 18 1 0 0 0 13 19 1 0 0 0 13 20 1 0 0 0 M END ```