Open brindakv opened 6 years ago
Perhaps I don't understand the issue here, but as far as I can see both struct_conf
and struct_conn
point to one or more pairs of asym_id
/seq_id
. I haven't seen any multi-state models (atomic or coarse-grained) where for a given asym_id
composition is so different that a given seq_id
refers to a different part of the structure in two different states. Why would it? If the sequence is different it would have to be a different entity, and thus a distinct asym_id
.
The question is whether mmCIF atom_site allows multiple models which have different sets of atoms. I have never seen such a file. But I agree as long as every model using a specific asym_id refers to the same entity then maybe there is no problem. But I think no mmCIF reader in use today is likely to handle that correctly. If code cannot make that assumption that models contain identical atoms then it needs to check if the different models have identical sets of atoms since connectivity will have to be determined separately for each model in that case.
In the old PDB format the specification explicitly says that multiple models must have identical atoms:
"each model should have the exact same atoms (hydrogen and heavy atoms), sequence and chemistry.”
http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#MODEL
For the mmCIF format there is not even a field for handling multiple models in atom_site, this is only added by PDBx as the _atom_site.pdbx_PDB_model_num field and the documentation merely says “PDB model number”, so no telling what this means.
http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx.dic/Items/_atom_site.pdbx_PDB_model_num.html
The problem is that struct-conf
, struct_conn
and some of the other struct_group
categories do not have a data item pointing to the _atom_site.pdbx_PDB_model_num
. Therefore, they are only populated for the first model in an ensemble. The assumption of a homogenous ensemble is therefore implicit.
They don't need to reference a model number, because a given seq_id
/asym_id
pair should be valid for all models. I'd assume if you have one model with only chain A in it, and another model with only chain B, your struct_conf
would contain entries for both asym_id=A
and asym_id=B
.
Although the
atom_site
category has been extended in the IHM-dictionary to accommodate compositionally different multi-state structures, thestruct_group
categories still assume uniform composition across models (e.g.struct_conf
,struct_conn
). Data categories that are derived from the coordinates in theatom_site
category and assume uniform composition will therefore break in case of atomic multi-state structures. This could be addressed either in the PDBx/mmCIF dictionary or in the IHM-dictionary extension.