Open joaomcteixeira opened 4 years ago
We cannot predict every single combination of mal-formatted PDB files out there in the wild... The PDB format stipulates that "ATOM records for proteins are listed from amino to carboxyl terminus" and that "The insertion code is used when two residues have the same numbering. The combination of residue numbering and insertion code defines the unique residue".
I think I understand the issue here, will work on it in the next few days.
By no means, I meant this was a bug from pdb_delinsertion
. I was also very surprised when I found this situation because that residue nomenclature is not expected. Now reading your comment, and thinking in more detail (yesterday I just reported without any deeper consideration), attention @JoaoRodrigues maybe we should NOT consider this case on pdb_delinsertion
. Because residue numbers are discontinuous, by adding this case to pdb_delinsertion
may cause other problems on correct and unconsidered cases. Therefore, as you well stated, if this example 1MH1
violates the PDB rules, it should not be considered for pdb-tools
.
What are your thoughts?
Today I found another curious case for
pdb_delinsertion
. I don't call it a bug but, instead, a case not considered yet.PDB ID:
1MH1
In this
pdb
there are two residues at the beginning that belong to the purification tag and are assigned are insertions. The chain is continuous, there are no backbone breaks.applying the latest (v.2.0.5) for
pdb_delinsertion
, yields: