haddocking / pdb-tools

A dependency-free cross-platform swiss army knife for PDB files.
https://haddocking.github.io/pdb-tools/
Apache License 2.0
372 stars 113 forks source link

pdb-deinsertion renumbers chains without insertion codes #67

Closed joelbard closed 3 years ago

joelbard commented 3 years ago

The pdb-deinsert tool works nicely but it seems that it affects chains beyond the one with the insertion code. I have an antibody antigen complex where I'm trying to remove the insertion codes for the antibody. The antibody chains are A and B. The antigen sequence numbering starts at C 1391. After removing insertion codes the antigen now starts at C 1396. The antigen should be unaffected by this operation. Thanks for a very handy tool....

JoaoRodrigues commented 3 years ago

Thanks for the suggestion @joelbard , could you provide us with a simple example PDB that we can test on?

Also, could you check if running pdb_tidy first fixes this issue, e.g. pdb_tidy your.pdb | pdb_delinsertion ?

joelbard commented 3 years ago

The below seems to fix the problem

 offset = 0
    prev_resi = None
    seen_ids = set()
    clean_icode = False
    curChain = ''
    records = ('ATOM', 'HETATM', 'ANISOU', 'TER')
    for line in fhandle:

        if line.startswith(records):
            res_uid = line[17:27]  # resname, chain, resid, icode
            id_res = line[21] + line[22:26].strip()  # A99, B12
            chain = line[21]
            if chain != curChain:
                curChain = chain
                offset = 0
            has_icode = line[26].strip()  # ignore ' ' here
joelbard commented 3 years ago

didn't format that well...it's all code...

JoaoRodrigues commented 3 years ago

Hi @joelbard , did you try running pdb_tidy your.pdb | pdb_delinsertion to see if it fixes the issue?

joelbard commented 3 years ago

I just tried running the file through pdb_tidy. It does fix the problem with pdb_delinsertion incrementing the residue numbers of subsequent chains. It has the side-effect of adding TER cards every time there is a gap in the residue numbering which I don't think is in keeping with the definition of TER in the pdb format definition. My understanding is that TER is meant to be used only at the true carboxy terminus of the chain and not at spots where residues present in the SEQRES are omitted from the model due to missing density. In my case there is also a deletion in the construct used for crystallography so the residue numbering jumps to maintain consistency with canonical numbering of the molecule. This leads to a covalent connection between residues with discontinuous numbering. I would certainly not want a TER card between two bonded residues.

JoaoRodrigues commented 3 years ago

Glad it sorted it out. I'd rather keep it like this (tidy + delinsertion) than adding more functionality to delinsertion.

Thanks for raising the issue with the TER statements; all valid points. We used TERs to separate discontinuous regions because that's what some (old and new) programs use to signal a chain break. I'll look into changing this so that it only affects true chain endings.

JoaoRodrigues commented 3 years ago

@joelbard we just pushed a change to pdb_tidy that adds an option not to add TER records on chain breaks. You can try it with pdb_tidy -strict 1abc.pdb. Make sure to use the latest version of pdb-tools: pip install --upgrade pdb-tools.

Thanks for raising this issue! I'll close it but feel free to re-open if you think we should make more changes!