SDF and molfiles generated by the Maestro suite don't follow the connection table specification. The main difference is that they use less entries for each record in the V2K format (6 columns instead of the required 12 for coordinates and 3 columns instead of the required 4 for bonds).
This results in the following error when reading a Maestro generated SDF or molfile:
Error: Cannot read coordinates from connection table
--> aspirin3d_maestro.mol:5:52-54
|
5 | 1.2333 0.5540 0.7792 O 0 0 0 0 0 0
| ^^^ unexpected value
|
Note that the error message here could be clearer stating that we expect more values.
The solution
The best fix would be to allow the format extension by Maestro as valid connection table format. In case we exhaust the columns we just assume zeros were provided and not raise a syntax error in such a case.
The actual implementation is present in src/mctc/io/read/ctfile.f90:
Entries for the coordinates are read at (here no entry is really required if we assume zero)
The problem
SDF and molfiles generated by the Maestro suite don't follow the connection table specification. The main difference is that they use less entries for each record in the V2K format (6 columns instead of the required 12 for coordinates and 3 columns instead of the required 4 for bonds).
This results in the following error when reading a Maestro generated SDF or molfile:
Note that the error message here could be clearer stating that we expect more values.
The solution
The best fix would be to allow the format extension by Maestro as valid connection table format. In case we exhaust the columns we just assume zeros were provided and not raise a syntax error in such a case.
The actual implementation is present in
src/mctc/io/read/ctfile.f90
:Entries for the coordinates are read at (here no entry is really required if we assume zero)
https://github.com/grimme-lab/mctc-lib/blob/f02b5905d6c80fc441ec7bcf150f9a6f46fa6d3f/src/mctc/io/read/ctfile.f90#L199-L204
Entries for the bonds are read at (here we need the first three entries, atom indices + bond order)
https://github.com/grimme-lab/mctc-lib/blob/f02b5905d6c80fc441ec7bcf150f9a6f46fa6d3f/src/mctc/io/read/ctfile.f90#L227-L232
The entries, if present, should still be parsed to avoid overlooking malformatted input files.
Reproducer for regression testing
valid
``` 2244 -OEChem-02042111203D 21 21 0 0 0 0 0 0 0999 V2000 1.2333 0.5540 0.7792 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.6952 -2.7148 -0.7502 O 0 0 0 0 0 0 0 0 0 0 0 0 0.7958 -2.1843 0.8685 O 0 0 0 0 0 0 0 0 0 0 0 0 1.7813 0.8105 -1.4821 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.0857 0.6088 0.4403 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.7927 -0.5515 0.1244 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.7288 1.8464 0.4133 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.1426 -0.4741 -0.2184 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.0787 1.9238 0.0706 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.7855 0.7636 -0.2453 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.1409 -1.8536 0.1477 C 0 0 0 0 0 0 0 0 0 0 0 0 2.1094 0.6715 -0.3113 C 0 0 0 0 0 0 0 0 0 0 0 0 3.5305 0.5996 0.1635 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.1851 2.7545 0.6593 H 0 0 0 0 0 0 0 0 0 0 0 0 -2.7247 -1.3605 -0.4564 H 0 0 0 0 0 0 0 0 0 0 0 0 -2.5797 2.8872 0.0506 H 0 0 0 0 0 0 0 0 0 0 0 0 -3.8374 0.8238 -0.5090 H 0 0 0 0 0 0 0 0 0 0 0 0 3.7290 1.4184 0.8593 H 0 0 0 0 0 0 0 0 0 0 0 0 4.2045 0.6969 -0.6924 H 0 0 0 0 0 0 0 0 0 0 0 0 3.7105 -0.3659 0.6426 H 0 0 0 0 0 0 0 0 0 0 0 0 -0.2555 -3.5916 -0.7337 H 0 0 0 0 0 0 0 0 0 0 0 0 1 5 1 0 0 0 0 1 12 1 0 0 0 0 2 11 1 0 0 0 0 2 21 1 0 0 0 0 3 11 2 0 0 0 0 4 12 2 0 0 0 0 5 6 1 0 0 0 0 5 7 2 0 0 0 0 6 8 2 0 0 0 0 6 11 1 0 0 0 0 7 9 1 0 0 0 0 7 14 1 0 0 0 0 8 10 1 0 0 0 0 8 15 1 0 0 0 0 9 10 2 0 0 0 0 9 16 1 0 0 0 0 10 17 1 0 0 0 0 12 13 1 0 0 0 0 13 18 1 0 0 0 0 13 19 1 0 0 0 0 13 20 1 0 0 0 0 M END ```aspirin3d.mol
“incorrect”
``` 2244 3D Schrodinger Suite 2022-1. 21 21 0 0 1 0 999 V2000 1.2333 0.5540 0.7792 O 0 0 0 0 0 0 -0.6952 -2.7148 -0.7502 O 0 0 0 0 0 0 0.7958 -2.1843 0.8685 O 0 0 0 0 0 0 1.7813 0.8105 -1.4821 O 0 0 0 0 0 0 -0.0857 0.6088 0.4403 C 0 0 0 0 0 0 -0.7927 -0.5515 0.1244 C 0 0 0 0 0 0 -0.7288 1.8464 0.4133 C 0 0 0 0 0 0 -2.1426 -0.4741 -0.2184 C 0 0 0 0 0 0 -2.0787 1.9238 0.0706 C 0 0 0 0 0 0 -2.7855 0.7636 -0.2453 C 0 0 0 0 0 0 -0.1409 -1.8536 0.1477 C 0 0 0 0 0 0 2.1094 0.6715 -0.3113 C 0 0 0 0 0 0 3.5305 0.5996 0.1635 C 0 0 0 0 0 0 -0.1851 2.7545 0.6593 H 0 0 0 0 0 0 -2.7247 -1.3605 -0.4564 H 0 0 0 0 0 0 -2.5797 2.8872 0.0506 H 0 0 0 0 0 0 -3.8374 0.8238 -0.5090 H 0 0 0 0 0 0 3.7290 1.4184 0.8593 H 0 0 0 0 0 0 4.2045 0.6969 -0.6924 H 0 0 0 0 0 0 3.7105 -0.3659 0.6426 H 0 0 0 0 0 0 -0.2555 -3.5916 -0.7337 H 0 0 0 0 0 0 1 5 1 0 0 0 1 12 1 0 0 0 2 11 1 0 0 0 2 21 1 0 0 0 3 11 2 0 0 0 4 12 2 0 0 0 5 6 1 0 0 0 5 7 2 0 0 0 6 8 2 0 0 0 6 11 1 0 0 0 7 9 1 0 0 0 7 14 1 0 0 0 8 10 1 0 0 0 8 15 1 0 0 0 9 10 2 0 0 0 9 16 1 0 0 0 10 17 1 0 0 0 12 13 1 0 0 0 13 18 1 0 0 0 13 19 1 0 0 0 13 20 1 0 0 0 M END ```aspirin3d_maestro.mol