haddocking / pdb-tools

A dependency-free cross-platform swiss army knife for PDB files.
https://haddocking.github.io/pdb-tools/
Apache License 2.0
372 stars 113 forks source link

Ignore residues with non-matching altlocs when only one is present. #96

Closed JoaoRodrigues closed 3 years ago

JoaoRodrigues commented 3 years ago

Addresses issue #90 .

JoaoRodrigues commented 3 years ago

Congratulations @joaomcteixeira - I upgraded you to Maintainer so I can add you to reviews :)

codecov[bot] commented 3 years ago

Codecov Report

Merging #96 (2ef9542) into master (3dbfcbb) will increase coverage by 0.34%. The diff coverage is 98.46%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #96      +/-   ##
==========================================
+ Coverage   82.00%   82.34%   +0.34%     
==========================================
  Files          46       47       +1     
  Lines        3663     3711      +48     
  Branches      763      767       +4     
==========================================
+ Hits         3004     3056      +52     
+ Misses        469      465       -4     
  Partials      190      190              
Impacted Files Coverage Δ
pdbtools/pdb_selaltloc.py 90.15% <98.46%> (+7.15%) :arrow_up:
pdbtools/pdb_fixinsert.py 87.50% <0.00%> (ø)
pdbtools/pdb_delinsertion.py 100.00% <0.00%> (+12.63%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3dbfcbb...61433ca. Read the comment docs.

joaomcteixeira commented 3 years ago

Efficiently solves #90.

However, some comments:

Examples:

pdb_selaltloc -A altloc.pdb

:+1: works as expected

pdb_selaltloc -B altloc.pdb

Keeps only B meaning that residues labeled only with A are removed. Surely this is intended by the one-tool-one-job philosophy of pdb-tools. Yet, I think we should explain this in the docstring.

pdb_selaltloc -C altloc.pdb

Because there are no C labels, labels A and B are removed, resulting in residues being removed entirely, silently. We need to explain this in the docstring or block the execution of selaltloc if a non-existing altloc is selected. However, the latter would come at a much increased computational cost for the size of the tool. I prefer the first choice.

pdb_selaltloc altloc.pdb

Removes the altloc labels but does not select the one with more occupancy, see SER/PRO 22. In the tests/data/dummy.pdb the atom CA belongs to the same residue, so the highest occupancy is selected. But in this case, SER and PRO have the same residue number but are different residue identities, the highest occupancy is not selected, and both residues are kept. Is this truly intended or is it an unexpected behaviour? It makes a good combo with pdb_delinsertion as described in #90. Just to clarify.

What are your thoughts?

other note

Currently, we have dummy.pdb that serves several tests. This creates a log of cohesion because if we modify the dummy.pdb to enhance the testing of some tool, several tests for other tools will break. I think we should start creating/separating tests PDBs one per tool. That is why I added altloc.pdb.

Before merging this PR, let's add as many tests as possible for this case so the exact expected behavior gets fully documented.

JoaoRodrigues commented 3 years ago

@joaomcteixeira check the new commits. It should fix all the issues we discussed here (and offline):

I added a new test and split the code in the script in two functions. Have a look please!

joaomcteixeira commented 3 years ago

@JoaoRodrigues

Very nice implementation!

I took the freedom to commit directly to your branch. Added minor updates on the main script just for micro performance improvement. Nice on the operator.itemgetter[1] it is actually faster than lambda x: x[1].

Added the test case we discussed and extra tests covering option -C.

Because I was the last to commit, is on you to review and merge :wink:

JoaoRodrigues commented 3 years ago

Argh. Fixed in #99