ihmwg / python-modelcif

Python package for handling ModelCIF mmCIF and BinaryCIF files
MIT License
9 stars 1 forks source link

Quote STAR syntax constructs #25

Closed gtauriello closed 2 years ago

gtauriello commented 2 years ago

@benmwebb We noticed an issue in the writer when having strings such as "stop_at_score" (as in the attached file here.

Basically from what we understand, the STAR format specifies that the string should be quoted as it starts with "stop_". See section "Privileged constructs and the STAR format" in http://www.globalphasing.com/startools/ for details.

Hence the generated files (like the one linked above) only work properly in ModelArchive (using the GEMMI library to handle the file) if we replace the line 12 1 float stop_at_score 85.000 . with 12 1 float "stop_at_score" 85.000 .

Would it be possible to apply a change in the writer that quotes those strings? Otherwise we run into issues with converting ColabFold models into ModelCIF since one of the SW params there is called "stop_at_score".

benmwebb commented 2 years ago

Do you have a reader that refuses to read these files?

The URL you cite says that this is ambiguous and to date python-ihm interpreted the standard as meaning that stop_ itself should be quoted but strings that only start with stop_ do not need to be: https://github.com/ihmwg/python-ihm/blob/0.33/ihm/format.py#L191-L198

This is easy to fix though, as we already do this for strings that start with data_.

gtauriello commented 2 years ago

@benmwebb we observed it with the GEMMI library (which is also used heavily by people at PDBe).

I understand the ambiguity in the format but it doesn't seem to hurt to just quote the ones starting with "stop_" etc to be on the safe side. That seems also to be the recommendation in the linked reference ("To avoid ambiguity, it therefore seems best to quote such values...").

benmwebb commented 2 years ago

Agreed, there is no reason not to quote such strings.