unzip pdbfilter_debug.zip
cd pdbfilter_debug
pdbfilter.py input.fas cluster.tsv pdb_filter.dat output.fas
Suggested debugging
As they are in the same cluster, I think they should result in one representative sequence. I think it can be debugged by modifying the file like this:
if best_entry_res is not None:
selected_sequences.add(best_entry_res)
if DEBUG:
print (' - Selected {n} (best resolution = {r}).'.format(
n = best_entry_res,
r = best_res))
elif best_entry_rfr is not None:
selected_sequences.add(best_entry_rfr)
if DEBUG:
print (' - Selected {n} (best R-free = {r}).'.format(
n = best_entry_rfr,
r = best_rfr))
elif best_entry_comp is not None:
selected_sequences.add(best_entry_comp)
if DEBUG:
print (' - Selected {n} (best completness = {r}).'.format(
n = best_entry_comp,
r = best_comp))
else best_entry_res == None and best_entry_rfr == None and best_entry_comp == None:
print ('! Warning: Did not find any representative entry for cluster {c}.'.format(
c = cluster))
:exclamation: Make to check out our User Guide.
Expected Behavior
When a PDB has multiple chains of the same protein, I expect the script to leave only one of the chains. For example with the following input files,
cluster.tsv
pdb_filter.dat
I expected it resulted in one representative output, but it resulted in two sequences, 4B9D_B and 4APC_B.
Current Behavior
It occasionally resulted in multiple chains.
Steps to Reproduce (for bugs)
pdbfilter_debug.zip
Suggested debugging
As they are in the same cluster, I think they should result in one representative sequence. I think it can be debugged by modifying the file like this: