nglviewer / ngl

WebGL protein viewer
http://nglviewer.org/ngl/
MIT License
657 stars 168 forks source link

V3000 support #955

Closed papillot closed 1 year ago

papillot commented 1 year ago

This PR is based on previous work made by @pablowatson through PR #944

SDF files may contain both V2000 and V3000 molfile in the same file. The parser adapts to the format as the lines are processed. The strategy for detecting the atom blocks and bond blocks is still based on line counts for V2000, while it relies on boundary detection for V3000 (e.g. BEGIN ATOM). Also, parsing for V2000 is based on column positions, while for V3000 lines are tokenized first as there is no provision on the length of the fields. The V3000 parser also supports line breaks as specified in V3000 format (when last token on hanging line is -).

ppillot commented 1 year ago

LGTM I didn't realize you could mix V2000 and V3000, so I learnt something new. Thanks again for doing this.

I discovered this from the BioVia specs when reviewing your initial PR, and was surprised too. In fact they define SDF format in the CTFile format document, after the definitions of the V3000 and V2000 formats. I suspect it's not common to do so, but I guess it could be useful when you catenate files using different formats.