marrink-lab / polyply_1.0

Generate input parameters and coordinates for atomistic and coarse-grained simulations of polymers, ssDNA, and carbohydrates
Apache License 2.0
122 stars 21 forks source link

MetaMolecule for Proteins with EN and non-unique resids fails #300

Open fgrunewald opened 1 year ago

fgrunewald commented 1 year ago

Describe the bug Generating a residue graph for proteins, connected by an EN with non-unique resids fails. This leads to some scrambling in the coordinate reading. This only affects highly symmetric proteins, where the resid and rename of two residues are the same and they are connected by an EN bond.

Work Around Have martinize assign new resids for the protein (i.e. do not use the -resid input flag)

fgrunewald commented 1 year ago

@pckroon any idea what to do about this one?

pckroon commented 1 year ago

Fails how? Another workaround/solution is to assign chain IDs at some point in the pipeline. In general we really need to be smarter in how we define what a "residue" is

fgrunewald commented 1 year ago

It makes the two residues a single one, which is what it should do, but not what we want. There are no chain ID information for topologies unfortunately.

And yeah we somehow need to be smarter I agree.

fgrunewald commented 1 year ago

It would be cool to use something like graph based CG (https://www.osti.gov/servlets/purl/1558004), where you essentially cluster the graph nodes based on contact and iteratively obtain coarse representations. For the building procedure this would be really cool I think

pckroon commented 1 year ago

It would be cool to use something like graph based CG (https://www.osti.gov/servlets/purl/1558004), where you essentially cluster the graph nodes based on contact and iteratively obtain coarse representations. For the building procedure this would be really cool I think

I looked at similar methods for Cartographer way back when. The main issue for the methods I could find back then were that they all assume coarse-graining atoms is a many-to-one mapping, where each atom contributes to exactly one CG bead. And this assumption is simply not true for Martini.

It makes the two residues a single one, which is what it should do, but not what we want. There are no chain ID information for topologies unfortunately.

Maybe we should assign chain IDs based on molecule indices (if no chain info is available yet) after we create bonds. Make bonds assigns a 'mol_idx' attribute, maybe we can leverage that in a separate processor