haddocking / pdb-tools

A dependency-free cross-platform swiss army knife for PDB files.
https://haddocking.github.io/pdb-tools/
Apache License 2.0
372 stars 113 forks source link

`pdb_fixinsert` and its impact on numbering #110

Closed aanastasiou closed 2 years ago

aanastasiou commented 2 years ago

Hello

I came across pdb_fixinsert and just wanted to confirm that my understanding of its operation is correct before I start using it more systematically.

The tool's help mentions that it "works by deleting an insertion code and shifting the residue numbering of downstream residues". Although "delete" appears here, the tool does not actually delete anything. Rather, it seems to be renaming the insertion code residue so that it disappears. So, in the case of 1,2,3,4,5,5A,6,7,8,9,10,.... it would turn it to 1,2,3,4,5,6,7,8,9,10,11,.... With 6 now being the 5A, 7 being the 6 and so on.

The number of residues does not change. The total number of lines (between the original and processed through pdb_fixinsert) does not change.

Is there a specific use case that pdb_fixinsert addresses and is the interpretation of the insertion codes context specific? My understanding from looking up more about the interpretation of insertion codes was that they encode more than one pieces of information: The position of the residue (and which residue it is) but also what else might appear in that index on the same molecule but a different species. I was therefore looking for a way to decouple these two pieces of information. Therefore, coming across pdb_fixinsert's description, I thought that it was going to remove those residues with insertion codes in a better way (than just deleting them).....but it only seems to renumber them.

Any help with this would be greatly appreciated.

amjjbonvin commented 2 years ago

Indeed only renumbering and removal of the A,B,… insertion codes

aanastasiou commented 2 years ago

Thank you @amjjbonvin. Is that a generic treatment though? This means that the residues with the insertion codes are supposed to be part of the sequence. If they were, then why tag them differently in the first place?

amjjbonvin commented 2 years ago

It is the way the nomenclature is for antibodies. They are treated as insertions in the hypervariable loops so that the numbering of the remaining of the structure remains the same.

But such insertions can not be handled by haddock without renumbering.