Open josemduarte opened 5 years ago
A possible solution proposed by @pwrose is to store the full chemical component ID with the group data for unobserved residues here: https://github.com/rcsb/mmtf/blob/master/spec.md#group-data
However, that requires either a new flag observed y/n
or making the formalChargeList
, elementList
and atomNameList
optional fields (now they are required).
Given that the group data in MMTF only lists the observed atoms, I would say that an unobserved residue could be represented with a group which has 0-length arrays formalChargeList
, atomNameList
and elementList
. I don't see a problem with those arrays being empty. At least the C++ decoder/encoder shouldn't have any issues with it.
Given that the fields are required, the arrays should always be written in the MMTF file, but there is no problem with writing 0-length arrays in msgpack.
Since mmtf stores the SEQRES groups as 1-letter code strings, the chemical component id for any residue that is non-standard and happens to be unobserved will be lost. E.g. for 2X3T chain E (a glycopeptide) contains several unobserved non-standard aminoacids that are represented like "KXXXXXXEX". For groups that are observed, the chemical component identifier is recoverable from the ATOM information, but not for those that are unobserved.