marrink-lab / vermouth-martinize

Describe and apply transformation on molecular structures and topologies
Apache License 2.0
89 stars 40 forks source link

Martinizing multiple proteins fails #421

Closed BartBruininks closed 2 years ago

BartBruininks commented 2 years ago

When I am running martinize on a homo dimer (e.g. two insulin molecules), the residue index skips the exact model size before numbering continues. This additional shift of one times the model size seems to also be preserved in the ITP. I have a strong feeling that there is a small math error somewhere. Renumbering the residues or atoms in the input file does not solve the issue. Adding a single CONECT between the two molecules does rescue this behavior, but can result in an additional unwanted bond. Plus it is a rather dirty solution.

martinize2 -f 5BTS_clean_gmxRenumber.pdb -o insulin_dimer.top -x insulin_dimer_cg.pdb -dssp /usr/local/bin/mkdssp -p backbone -ff martini3001 -govs-include -govs-moltype insulin_dimer -scfix -cys auto

5BTS_clean_gmxRenumber.pdb.txt insulin_dimer_cg.pdb.txt

fgrunewald commented 2 years ago

Hi Bart,

Does this only concern the resid numbers in the itp file? Do you get the dislufide bonds as required?

BartBruininks commented 2 years ago

It seems to be present everywhere. The CG PDB has the jump, the ITP has the jump. I checked for 6 disulfide bridges, they were there, however, I do not know if they connect the right residues. I think in theory this file could work but the numbering just is very strange and clearly something is going on.

Besides, the virtual sites scripts do not like this at all as now the numbering is not consistent with the contact map files.

fgrunewald commented 2 years ago

Just to be completely sure: By numbering you mean resids? Could this be a duplicate of #154 and/or #366

BartBruininks commented 2 years ago

I think this is indeed the same issue, however, it is strange that an artificial conect saves this. However, I agree with the previously suggested, that it would be nice to allow for forcing RESIDS to remain the same. Also the current implementation makes the mistake of starting the numbering of molecule two at two times the length of molecule one, plus one (2n+1) -- which seems very odd to me. Default behavior of n+1 would make much more sense for continuous labeling.

fgrunewald commented 2 years ago

@BartBruininks can you try this branch on your protein https://github.com/marrink-lab/vermouth-martinize/pull/399

BartBruininks commented 2 years ago

I can, but not now. I will do it hopefully later today or otherwise tomorrow.

fgrunewald commented 2 years ago

Thanks! Let me know if that works.

BartBruininks commented 2 years ago

I used fix/366 and it did not solve the issue of the residue numbering skip.

fgrunewald commented 2 years ago

I actually figured out what happens here. It is in fact a bit more tricky. The input consists of 2 dimers (4 proteins). Each dimer is connected with a disulphide bridge. So martinize puts them together as one molecule. That leaves 2 molecules before mapping. Then mapping happens on the two molecules separately. However, the mapping processor resets the resids after mapping to the ones found in the PDB. So we end up with 1 homo-dimer being correctly numbered from 1 to 51. The second homo-dimer however will incorrectly or perhaps correctly get resids 51-102.

Later the two are merged due to the Go settings. The merge molecule, however, takes the largest resid in the first molecule (i.e. 51) and adds that to the resids of the to be merged residues. In this case that would be 51+51, 52+51 etc. So we end up with one molecule length offset. In fact if you run martinize2 without Go setting you end up with two molecule itps that correctly end up being numbered from 1-51 and 52-102.

The solution here would be to either give each chain resids running from 1-51 or give chains C and D resides running from 1-102.

I don't think in the current setting this is a bug. It rather is a problem with the pdb input having resids spanning two molecules. That doesn't mean this is not an issue. I think it is a big one in fact.

fgrunewald commented 2 years ago

this actually is the same issues as in #366.

fgrunewald commented 2 years ago

this has been solved. Setting '-resid input' flag will preserve the resids from the initial pdb