salilab / pmi

Python Modeling Interface
https://integrativemodeling.org/nightly/doc/ref/namespaceIMP_1_1pmi.html
12 stars 11 forks source link

Add copy capability to BuildSystem #184

Closed cgreenberg closed 2 years ago

cgreenberg commented 8 years ago

We've gone back and forth on this but it's VERY common so we should allow it. This is the current BuildSystem input file:

|component_name|domain_name|fasta_fn|fasta_id|pdb_fn|chain|residue_range|pdb_offset|bead_size|em_residues_per_gaussian|rigid_body|super_rigid_body|chain_of_super_rigid_bodies|
|Prot1 |Prot1 |seqs.fasta|Protein_1|prot.pdb   |A|55,65  |-54 |5|10|1|1,2| |
|Prot2 |Prot2A|seqs.fasta|Protein_2|prot.pdb   |B|180,187|-179|5|0 |2|1  | |
|Prot2 |Prot2B|seqs.fasta|Protein_2|prot.pdb   |B|188,-1 |-179|5|0 |2|1  | |
|Prot3 |Prot3 |seqs.fasta|Protein_3|BEADS      |C|       |    |5|0 | |1,2|3|
|Prot4 |Prot4 |seqs.fasta|Protein_3|IDEAL_HELIX|D|       |    | |10 |3|1  | |

This is problematic, because currently to "copy" you have to create multiple molecules with different names! That works for PMI1, but in PMI2 we really want the unique molecule name to be correct so we can select it.

One idea is to add a "num copies" column. But that doesn't help you if you want to swap out PDB files in the copies. Maybe we could just have a "copy line" where instead of passing a sequence you pass the word "COPY," showing here for Prot2:

|component_name|domain_name|fasta_fn|fasta_id|pdb_fn|chain|residue_range|pdb_offset|bead_size|em_residues_per_gaussian|rigid_body|super_rigid_body|chain_of_super_rigid_bodies|
|Prot2 |Prot2A|seqs.fasta|Protein_2|prot.pdb   |B|180,187|-179|5|0 |2|1  | |
|Prot2 |Prot2B|seqs.fasta|Protein_2|prot.pdb   |B|188,-1 |-179|5|0 |2|1  | |
|Prot2 |Prot2A|COPY      |         |protX.pdb  |B|180,187|-179|5|0 |2|1  | |
|Prot2 |Prot2B|COPY      |         |protX.pdb  |B|188,-1 |-179|5|0 |2|1  | |

Another option, possibly we should have both, is to clone the whole molecule (including all domains):

|component_name|domain_name|fasta_fn|fasta_id|pdb_fn|chain|residue_range|pdb_offset|bead_size|em_residues_per_gaussian|rigid_body|super_rigid_body|chain_of_super_rigid_bodies|
|Prot2 |Prot2A|seqs.fasta|Protein_2|prot.pdb   |B|180,187|-179|5|0 |2|1  | |
|Prot2 |Prot2B|seqs.fasta|Protein_2|prot.pdb   |B|188,-1 |-179|5|0 |2|1  | |
|Prot2 |CLONE |          |         |           | |       |    | |  |2|1  | |

So the nomenclature is you can "copy" a domain but "clone" the whole molecule. This would let you do 99% of topology/DOF in a single text file.

Pellarin commented 8 years ago

What about that, if we have a 2 domain with the same name, they are assumed to be "COPY" and if you have two protein with the same name, but the domains are not defined for one of the two, then you have a "CLONE"?

|component_name|domain_name|fasta_fn|fasta_id|pdb_fn|chain|residue_range|pdb_offset|bead_size|em_residues_per_gaussian|rigid_body|super_rigid_body|chain_of_super_rigid_bodies|
|Prot2 |Prot2A|seqs.fasta|Protein_2|prot.pdb   |B|180,187|-179|5|0 |2|1  | |
|Prot2 |Prot2B|seqs.fasta|Protein_2|prot.pdb   |B|188,-1 |-179|5|0 |2|1  | |
|Prot2 |Prot2A|                |         |protX.pdb  |B|180,187|-179|5|0 |2|1  | |
|Prot2 |Prot2B|                |         |protX.pdb  |B|188,-1 |-179|5|0 |2|1  | |
|component_name|domain_name|fasta_fn|fasta_id|pdb_fn|chain|residue_range|pdb_offset|bead_size|em_residues_per_gaussian|rigid_body|super_rigid_body|chain_of_super_rigid_bodies|
|Prot2 |Prot2A|seqs.fasta|Protein_2|prot.pdb   |B|180,187|-179|5|0 |2|1  | |
|Prot2 |Prot2B|seqs.fasta|Protein_2|prot.pdb   |B|188,-1 |-179|5|0 |2|1  | |
|Prot2 |     |          |         |           | |       |    | |  |2|1  | |
cgreenberg commented 8 years ago

Yes, that makes sense! Will just have to add a bunch of checks.

cgreenberg commented 8 years ago

Next step is to generate this file directly from a paper or a wikipedia entry :)

Pellarin commented 8 years ago

Ahahaha!!! :-) That is cool! Or from a PDB entry, why not?

cgreenberg commented 8 years ago

We should also remove the domain naming from this format, since it's not currently used. And possibly add a "special flags" section for future requested features (e.g. don't fill in gaps etc)

cgreenberg commented 8 years ago

To make parsing easier (since it's hard to distinguish between copies and other domains) we'll have to number the copies, something like this:

|component_name|color|fasta_fn|fasta_id|pdb_fn|chain|residue_range|pdb_offset|bead_size|em_residues_per_gaussian|rigid_body|super_rigid_body|chain_of_super_rigid_bodies|flags
|Prot2   |blue |seqs.fasta|Protein_2|prot.pdb   |B|180,187|-179|5|0 |2|1  | | |
|Prot2   |blue |seqs.fasta|Protein_2|prot.pdb   |B|188,-1 |-179|5|0 |2|1  | | |
|Prot2.1 |green|          |         |protX.pdb  |B|180,187|-179|5|0 |2|1  | | |
|Prot2.1 |green|          |         |protX.pdb  |B|188,-1 |-179|5|0 |2|1  | | |
|Prot2.2 |red  |          |         |           | |       |    | |0 |2|1  | | |

The molecule name will always be "Prot2" it's just you can mark separate copies/clones with ".X" where X can be any string. But you can't do copying AND cloning so you'd have to choose 2.1 or 2.2 above.

cgreenberg commented 8 years ago

We have basic copies now, but it would be nice to easily replicate the whole topology file multiple times (true clones, eg spokes of an NPC). Maybe the easiest thing is to an add an option: BuildSystem.add_state(reader_obj,num_clones=12).

benmwebb commented 2 years ago

Looks like this is essentially done.