marrink-lab / vermouth-martinize

Describe and apply transformation on molecular structures and topologies
Apache License 2.0
95 stars 43 forks source link

Unexpected TER in martinize2 Production- protein with 4 subunits split into 7 ITPs with 3 unexpected TERs #619

Open JingkaiZeng opened 4 days ago

JingkaiZeng commented 4 days ago

I'm currently using martinize2 to model a protein system with four subunits/chains. Following the MARTINI official tutorial, I used the following command:

martinize2 -f 8io5.pdb -o 8io5_only.top -x 8io5_cg.pdb -dssp -p backbone -ff martini3001 -elastic -ef 700.0 -el 0.5 -eu 0.9 -ea 0 -ep 0 -scfix -cys auto

However, I encountered an issue where the resulting files split my chains unexpectedly. Specifically, chains B,C,D are each split into two parts, even though the input PDB file only contains four TER entries (one for each chain), 3 unexpected TERs in chain B,C,D. The command output suggests the issue occurs around residues TYR597 and LEU602 within each of these chains. image

Here's a snippet of the log:

    INFO - general - Applying modification N-ter to residue A-TRP398
    INFO - general - Applying modification N-ter to residue A-ASN598
    INFO - general - Applying modification C-ter to residue A-ARG863
    INFO - general - Applying modification N-ter to residue B-TRP398
    INFO - general - Applying modification C-ter to residue B-TYR597
    INFO - missing-atom - Missing atom TYR597:OXT
    INFO - general - Applying modification N-ter to residue B-ASN598
    INFO - general - Applying modification C-ter to residue B-ARG863
    INFO - general - Applying modification N-ter to residue C-TRP398
    INFO - general - Applying modification C-ter to residue C-LEU602
    INFO - missing-atom - Missing atom LEU602:OXT
    INFO - general - Applying modification N-ter to residue C-GLY603
    INFO - general - Applying modification C-ter to residue C-ARG863
    INFO - general - Applying modification N-ter to residue D-TRP398
    INFO - general - Applying modification C-ter to residue D-TYR597
    INFO - missing-atom - Missing atom TYR597:OXT
    INFO - general - Applying modification N-ter to residue D-ASN598
    INFO - general - Applying modification C-ter to residue D-ARG863

It appears that the chains are being split at TYR597 and LEU602, with missing atoms OXT reported for these residues. Note: The input PDB file contains four chains (A, B, C, D) with residues numbered between 398-863. The PDB is partially populated with missing residues, but no residues should be missing in the 398-863 range. Could this issue be due to the missing atoms or a formatting error in the PDB file related to the missing residues?

Questions: What could be causing martinize2 to split the chains at these points? Could this be a bug in martinize2 related to handling missing residues? How can I prevent martinize2 from splitting the chains and ensure it recognizes the correct number of subunits?

The input files is available with this link that you can reproduce the same situation: https://drive.google.com/file/d/1FJ-aZyerrT8Y5EqkCko6Hq5ymfj88MNI/view?usp=drive_link

csbrasnett commented 4 days ago

I haven't looked in huge detail, but my guess is that you're right it's the missing residues. Martinize2 won't reconstruct missing residues for you, and if it's finding splits in chains, it will annotate them as N and C termini automatically. The only way to fix this would be to reconstruct the chains using your preferred method beforehand.

pckroon commented 4 days ago

To add to this, connectivity is guesstimated based on atom names (within residues), the coordinates of atoms, and CONECT records. It could be that e.g. residues 597 and 598 are complete, but just a little bit too far apart. If this is the case, you could add a CONECT record.

JingkaiZeng commented 4 days ago

I haven't looked in huge detail, but my guess is that you're right it's the missing residues. Martinize2 won't reconstruct missing residues for you, and if it's finding splits in chains, it will annotate them as N and C termini automatically. The only way to fix this would be to reconstruct the chains using your preferred method beforehand.

Thanks for your reply. I may not have made it clear that I have rebuilt the CHAIN, which is the file I shared in Google Cloud. That is, we don't have missing residues in the index 398-863 section. This is exactly why I am very confused ......

JingkaiZeng commented 4 days ago

To add to this, connectivity is guesstimated based on atom names (within residues), the coordinates of atoms, and CONECT records. It could be that e.g. residues 597 and 598 are complete, but just a little bit too far apart. If this is the case, you could add a CONECT record.

Thanks for your reply. I probably got it, I checked in PyMol and visually it looks further away, but not so far away that it's exaggerated, but it might be defined by the program as being no connectivity.

I'd like to follow up on the “add a CONECT record” you mentioned, how does that work exactly? I'm not sure how to add a record by rewriting the pdb.

pckroon commented 4 days ago

See https://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html, use your favourite text editor. To generate an example you can use -write-graph, which will write conect records for all the bonds that are found in your input. The distance criterion we use for estimating bonds between residues uses the same criteria as VMD, so you can also use that.