sokrypton / ColabDesign

Making Protein Design accessible to all via Google Colab!
Other
566 stars 132 forks source link

Patch fix_partial_contigs when residue numbering in PDB has a gap #142

Open data2code opened 1 year ago

data2code commented 1 year ago

In rf/utils.py, around line 78, should the 3 lines be added, in case the residue numbering in the original PDB file has a gap? Thanks!

          if L > 0:
            new_contig.append(f"{L}-{L}")
            unseen = []
          ### in case residue numbering jumps
          elif len(seen)>0 and seen[-1][1]!=i-1:
              new_contig.append(f"{seen[0][0]}{seen[0][1]}-{seen[-1][1]}")
              seen = []
          ###
          seen.append([c,i])
sokrypton commented 1 year ago

Thanks! Do you have an example input where this change is needed?

data2code commented 1 year ago

1crn.pdb.txt

Using the following toy example, the output from fix_partial_contigs becomes A1-44, and this then leads to incorrect output for fix_pdb, (chain E should have been renamed to B, but it is renamed to chain A by mistake).

If you change fix_partial_contigs to fix_contigs, the behavior is correct.

from inference.utils import parse_pdb
from colabdesign.rf.utils import fix_contigs, fix_partial_contigs, fix_pdb
parsed_pdb = parse_pdb('1crn.pdb')

pdb_str=open(f"1crn.pdb").read()
contigs = fix_partial_contigs(['A1-7/A10-44', 'E'], parsed_pdb)
print(contigs)
print("\n".join(fix_pdb(pdb_str, contigs).split("\n")[-6:]))

output:

['A1-44', 'E45-46']
ATOM    323  CB  ASN A  44      12.266   4.769  13.501  1.00  7.27      A    C
ATOM    324  CG  ASN A  44      12.538   4.304  14.922  1.00  7.98      A    C
ATOM    325  ND2 ASN A  44      13.407   3.298  15.015  1.00 10.32      A    N
ATOM    326  OD1 ASN A  44      11.982   4.849  15.886  1.00 11.00      A    O
ATOM    327  OXT ASN A  44      12.703   4.973  10.746  1.00  7.86      A    O1-
TER