haddocking / pdb-tools

A dependency-free cross-platform swiss army knife for PDB files.
https://haddocking.github.io/pdb-tools/
Apache License 2.0
378 stars 113 forks source link

Fixes delinsertion on consecutive residues with the same name #63

Closed JoaoRodrigues closed 4 years ago

JoaoRodrigues commented 4 years ago

Addresses issue #62 raised by @joaomcteixeira

PR adds tests and fix.

codecov[bot] commented 4 years ago

Codecov Report

Merging #63 into master will increase coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #63   +/-   ##
=======================================
  Coverage   81.69%   81.69%           
=======================================
  Files          44       44           
  Lines        3540     3530   -10     
  Branches      744      741    -3     
=======================================
- Hits         2892     2884    -8     
+ Misses        457      456    -1     
+ Partials      191      190    -1     
Impacted Files Coverage Δ
pdbtools/pdb_delinsertion.py 87.36% <100.00%> (+0.70%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 23fd473...90f06c2. Read the comment docs.

joaomcteixeira commented 4 years ago

The current code performs well in all the tests I have manually made.

I just want to state an additional case more clearly, however, this should not necessarily hinder the merge of this PR. My personal consideration is that pdb_delinsertion should account for the following case, but it is up to you @JoaoRodrigues and @amjjbonvin to actually decide because this is a core implementation decision.

Case:

If the TER line is truncated, that is, ending in the character after the residue number, using pdb_delinsertion will output the following (a missing '\n' character in the new TER line):

ATOM      3  C   ALA Y   9      36.213  46.067   1.258  1.00  0.00           C  
ATOM      3  O   ALA Y   9      36.287  47.046   2.028  1.00  0.00           O  
TER       3      ALA Y  10 ATOM      3  N   ASN A   1      22.066  40.557   0.420  1.00  0.00           N  
ATOM      3  H   ASN A   1      21.629  41.305  -0.098  1.00  0.00           H  
ATOM      3  H2  ASN A   1      23.236  40.798   0.369  1.00  0.00           H  

This does not happen if the TER line has 80 chars long. Therefore, using pdb_tidy before pdb_delinsertion solves this issues because the former corrects errors in TER lines.

I just wanted to make sure that I presented this case clearly. To solve this issue, we need an if at the beginning to add an additional ' ' to res_uid when lines start with TER.

if line.startswith(records):                                            
    if line.startswith('TER'):                                          
        res_uid = f'{line[17:26]} ' 
    else:                                                               
        res_uid = line[17:27] 

Having said this, @JoaoRodrigues if you think this case lies outside the scope of this PR, I approve this merge :+1:. I noticed you didn't refactor the string concatenation to f-strings, I would vote for that but I am fine if it is not done as well.

JoaoRodrigues commented 4 years ago

Thanks for the review! If lines are short, then people should use pdb_tidy. As for the f-strings, we'd have to do that on the entire codebase but also, some servers are still stuck on python 2..