merkys / covid-lt

Apache License 2.0
0 stars 0 forks source link

`antibody_numbering_converter` does not renumber residues properly #10

Open merkys opened 2 years ago

merkys commented 2 years ago

It is expected that the PDB outputs in pdb/Chothia/ have their antibody chains renumbered according to Chothia scheme. However, antibody_numbering_converter does not seem to do that properly. For example, chain E of PDB entry 7WD9 would not contain two residues indexed as 52: 52 and 52A although this is what AbNum would output.

Edit: Even worse situation observed with 7K8X chain D which in Chothia's scheme has multiple residues numbered 100 (with letters A-M) and antibody_numbering_converter ignores that completely. Thus the question is whether this tool is needed at all. An alternative tool is ANARCI which is a Python package and could be used locally. However, question persists whether the PDB format will allow multiple residues of the same number.

merkys commented 2 years ago

This can be fixed by replacing antibody_numbering_converter with convert_pdb_to_antibody_numbering_scheme.py script, also from the same Rosetta package.

Drawbacks:

I am thinking that replacing it with ANARCI-based script would be better. An additional benefit we would get is moving away from Rosetta. Furthermore, ANARCI detects light/heavy chains on its own. Pinging @GediminasA for thoughts.

merkys commented 2 years ago

Additional observation: in PDB entry 7K8X, chain D misses residue 108. This is how the tools renumber residues in the pristine file (without fixing it beforehand):

Original numbering ANARCI Rosetta
106 100B 100B
107 100C 100C
109 100E 100D
110 100F 100E

This goes to show that ANARCI is cleverer than Rosetta's tool.

merkys commented 2 years ago

More Rosetta bugs observed with PDB entry 7DEO:

  1. Fails with HETATM lines having CA residues (problem originates in Biopython)
  2. Fails to correctly number chains where 1st position of Chothia scheme starts later than on the first residue (observed on chain A of 7DEO).