`antibody_numbering_converter` does not renumber residues properly

merkys / covid-lt

Apache License 2.0

0 stars 0 forks source link

`antibody_numbering_converter` does not renumber residues properly #10

Open merkys opened 2 years ago

merkys commented 2 years ago

It is expected that the PDB outputs in pdb/Chothia/ have their antibody chains renumbered according to Chothia scheme. However, antibody_numbering_converter does not seem to do that properly. For example, chain E of PDB entry 7WD9 would not contain two residues indexed as 52: 52 and 52A although this is what AbNum would output.

Edit: Even worse situation observed with 7K8X chain D which in Chothia's scheme has multiple residues numbered 100 (with letters A-M) and antibody_numbering_converter ignores that completely. Thus the question is whether this tool is needed at all. An alternative tool is ANARCI which is a Python package and could be used locally. However, question persists whether the PDB format will allow multiple residues of the same number.

merkys commented 2 years ago

This can be fixed by replacing antibody_numbering_converter with convert_pdb_to_antibody_numbering_scheme.py script, also from the same Rosetta package.

Drawbacks:

When called, this script has to be told the light and heavy chain IDs (a pair) as it does not identify them on its own. Furthermore, it is unclear how to call the script when counts of light/heavy are different, or when chains of one of the types is missing.
Python 2.7-only, incompatible with Biopython > 1.68 due to https://github.com/biopython/biopython/issues/1551.

I am thinking that replacing it with ANARCI-based script would be better. An additional benefit we would get is moving away from Rosetta. Furthermore, ANARCI detects light/heavy chains on its own. Pinging @GediminasA for thoughts.

merkys commented 2 years ago

Additional observation: in PDB entry 7K8X, chain D misses residue 108. This is how the tools renumber residues in the pristine file (without fixing it beforehand):

Original numbering	ANARCI	Rosetta
106	100B	100B
107	100C	100C
109	100E	100D
110	100F	100E

This goes to show that ANARCI is cleverer than Rosetta's tool.

merkys commented 2 years ago

More Rosetta bugs observed with PDB entry 7DEO:

Fails with HETATM lines having CA residues (problem originates in Biopython)
Fails to correctly number chains where 1st position of Chothia scheme starts later than on the first residue (observed on chain A of 7DEO).