salilab / IHMValidation

Validation software for integrative models deposited to PDB
MIT License
2 stars 2 forks source link

Duplicated `auth_seq_id` (`auth_seq_num`) #79

Open aozalevsky opened 6 months ago

aozalevsky commented 6 months ago

Several entries have duplicated auth_seq_id:

Here is an example from PDBDEV_00000051:

loop_                                                                                                                                                                                                                                                                                                                         
_pdbx_poly_seq_scheme.asym_id                                                                                                                                                                                                                                                                                                 
_pdbx_poly_seq_scheme.entity_id                                                                                                                                                                                                                                                                                               
_pdbx_poly_seq_scheme.seq_id                                                                                                                                                                                                                                                                                                  
_pdbx_poly_seq_scheme.mon_id                                                                                                                                                                                                                                                                                                  
_pdbx_poly_seq_scheme.pdb_seq_num                                                                                                                                                                                                                                                                                             
_pdbx_poly_seq_scheme.auth_seq_num                                                                                                                                                                                                                                                                                            
_pdbx_poly_seq_scheme.pdb_mon_id                                                                                                                                                                                                                                                                                              
_pdbx_poly_seq_scheme.auth_mon_id                                                                                                                                                                                                                                                                                             
_pdbx_poly_seq_scheme.pdb_strand_id 
<...>
K1 1 908 VAL 1032 1032 VAL VAL K1
<...>
K1 1 1032 GLU 1032 1032 GLU GLU K1

1) It seems that it doesn't violate a dictionary format, but doesn't it violate a policy? @brindakv

2) It introduces a lot of problems for the analysis with tools relying on auth_seq_id (e.g., molprobity)

See the examples attached.

PDBDEV_00000053.log PDBDEV_00000051.log PDBDEV_00000052.log

brindakv commented 6 months ago

@aozalevsky auth_seq_id can be standardized if it is present in atom_site. If it is not, then it is difficult to fix it.

Using it in pdbx_poly_seq_scheme when it is not present in atom_site is misleading. The correction here would be to remove it completely (or replace with a ?). Would molprobity be able to handle that?