wwpdb-dictionaries / mmcif_pdbx

wwPDB PDBx/mmCIF Dictionary
Creative Commons Zero v1.0 Universal
9 stars 9 forks source link

how to interpret _pdbx_item_linked_group_list? #32

Open wojdyr opened 3 years ago

wojdyr commented 3 years ago

As an example, _pdbx_item_linked_group_list defines a relation between atom_site (child category) and pdbx_poly_seq_scheme (parent category).

atom_site                                9  '_atom_site.auth_asym_id'                             '_pdbx_poly_seq_scheme.pdb_strand_id'       pdbx_poly_seq_scheme
atom_site                                9  '_atom_site.auth_comp_id'                             '_pdbx_poly_seq_scheme.pdb_mon_id'          pdbx_poly_seq_scheme
atom_site                                9  '_atom_site.auth_seq_id'                              '_pdbx_poly_seq_scheme.pdb_seq_num'         pdbx_poly_seq_scheme
atom_site                                9  '_atom_site.label_asym_id'                            '_pdbx_poly_seq_scheme.asym_id'             pdbx_poly_seq_scheme
atom_site                                9  '_atom_site.label_comp_id'                            '_pdbx_poly_seq_scheme.mon_id'              pdbx_poly_seq_scheme
atom_site                                9  '_atom_site.label_entity_id'                          '_pdbx_poly_seq_scheme.entity_id'           pdbx_poly_seq_scheme
atom_site                                9  '_atom_site.label_seq_id'                             '_pdbx_poly_seq_scheme.seq_id'              pdbx_poly_seq_scheme
atom_site                                9  '_atom_site.pdbx_PDB_ins_code'                        '_pdbx_poly_seq_scheme.pdb_ins_code'        pdbx_poly_seq_scheme

How to validate such a relation? In this case, atoms in polymers are expected to have parents in _pdbx_poly_seq_scheme and atoms in non-polymers don't, because pdbx_poly_seq_scheme is for polymers only. Are there general rules to tell when parent in linked group must exist?

epeisach commented 3 years ago

This creates a group of linked items.

It says there must be a match between atom_site and pdbx_poly_seq_scheme in which the pairs would match.

This allows the dictionary to say a row in atom_site must match the same in pdbx_poly_seq_scheme. Otherwise, a simple parent/child relationship limits you to say that for a a value of atom_site.auth_asym_id - there is at least one value in pdbx_poly_seq_schee.pdb_strand_id that matches, which is a pretty low bar. Requiring that all attributes match at one time is useful to have.

The 9 is a group id.

wojdyr commented 3 years ago

It says there must be a match between atom_site and pdbx_poly_seq_scheme in which the pairs would match.

This allows the dictionary to say a row in atom_site must match the same in pdbx_poly_seq_scheme.

As we know, it's not true in general, because rows in atom_site that correspond to non-polymers don't match anything in pdbx_poly_seq. Are there general rules to tell when parent in linked group must exist?

epeisach commented 3 years ago

In theory the rules should apply to all. However, I see that there is some logic in cpp-cif-file/src/CifParentChild that will allow certain ones to be missing.