salilab / imp-sampcon

Scripts to assess sampling convergence and exhaustiveness
https://www.ncbi.nlm.nih.gov/pubmed/29211988
GNU General Public License v2.0
3 stars 7 forks source link

Use PMI stat file handling functions #9

Open benmwebb opened 4 years ago

benmwebb commented 4 years ago

Rather than reading stat files with our own code, we should use the IMP.pmi.output.ProcessOutput class. This handles both v1 and v2 statfiles, and also RMF files (stat file information can be written into the RMF file itself rather than a separate text file).

shruthivis commented 4 years ago

That part of the protocol (GoodScoringModelSelector.py) is superseded by @iecheverria and @ichem001 's new methods for selecting models for analysis. So not sure if it is worth investing a lot of time in revamping this script. Only the actin tutorial (and perhaps a couple of older application papers?) use this. Perhaps the actin tutorial should be updated to include the new analysis protocol?

iecheverria commented 4 years ago

Yes, this is true. The idea is to move the analysis away from arbitrary cutoffs and start looking into all sampled models in a probabilistic way. I still find selecting good scoring models useful for preliminary analysis while simulations are still running. For example, how well the good scoring models are satisfying the data and if the representation needs to be adjusted. Moving forward, I'm planning to incorporate everything, including what is in PMI_analysis and sampcon, into the PMI analysis module. I can add the new analysis protocol to the actin tutorial. Do we have a full set of trajectories? Where are they stored?

On Tue, Nov 19, 2019 at 6:34 PM shruthivis notifications@github.com wrote:

That part of the protocol (GoodScoringModelSelector.py) is superseded by @iecheverria https://github.com/iecheverria and @ichem001 https://github.com/ichem001 's new methods for selecting models for analysis. So not sure if it is worth investing a lot of time in revamping this script. Only the actin tutorial (and perhaps a couple of older application papers?) use this. Perhaps the actin tutorial should be updated to include the new analysis protocol?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/salilab/imp-sampcon/issues/9?email_source=notifications&email_token=ADBZGA477IAQZ2CEUOU5BKLQUSO5DA5CNFSM4JPLF2OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQPM7Q#issuecomment-555808382, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBZGAYFF2NYFPUWHT2DAD3QUSO5DANCNFSM4JPLF2OA .

--

Ignacia Echeverria Postdoctoral Scholar Department of Bioengineering and Therapeutic Sciences University of California, San Francisco http://salilab.org/~ignacia

shruthivis commented 4 years ago

https://github.com/salilab/actin_tutorial/tree/master/modeling has run1.zip and run2.zip which presumably correspond to the full set of trajectories from modeling.

On Wed, Nov 20, 2019 at 11:59 AM Ignacia Echeverria < notifications@github.com> wrote:

Yes, this is true. The idea is to move the analysis away from arbitrary cutoffs and start looking into all sampled models in a probabilistic way. I still find selecting good scoring models useful for preliminary analysis while simulations are still running. For example, how well the good scoring models are satisfying the data and if the representation needs to be adjusted. Moving forward, I'm planning to incorporate everything, including what is in PMI_analysis and sampcon, into the PMI analysis module. I can add the new analysis protocol to the actin tutorial. Do we have a full set of trajectories? Where are they stored?

On Tue, Nov 19, 2019 at 6:34 PM shruthivis notifications@github.com wrote:

That part of the protocol (GoodScoringModelSelector.py) is superseded by @iecheverria https://github.com/iecheverria and @ichem001 https://github.com/ichem001 's new methods for selecting models for analysis. So not sure if it is worth investing a lot of time in revamping this script. Only the actin tutorial (and perhaps a couple of older application papers?) use this. Perhaps the actin tutorial should be updated to include the new analysis protocol?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/salilab/imp-sampcon/issues/9?email_source=notifications&email_token=ADBZGA477IAQZ2CEUOU5BKLQUSO5DA5CNFSM4JPLF2OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQPM7Q#issuecomment-555808382 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ADBZGAYFF2NYFPUWHT2DAD3QUSO5DANCNFSM4JPLF2OA

.

--

Ignacia Echeverria Postdoctoral Scholar Department of Bioengineering and Therapeutic Sciences University of California, San Francisco http://salilab.org/~ignacia

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/salilab/imp-sampcon/issues/9?email_source=notifications&email_token=AB7N634SAHKR3VJPH6XFTYTQUTKKZA5CNFSM4JPLF2OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQ36NQ#issuecomment-555859766, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7N632PUXIFFI23GHSG3WLQUTKKZANCNFSM4JPLF2OA .

saltzberg commented 4 years ago

Shruthi is correct.

Ignacia, as it happens, I'm reworking the actin_tutorial this week to include PMI_analysis in preparation for a workshop I'm giving in a couple weeks. Has the workflow changed recently (past few months?).

As for integrating into PMI, I found some major bottlenecks in imp-sampcon, one of which requires changes to PMI_analysis, so maybe hold off a bit. The major workflow change is going from outputting and reading sets of individual RMF files for sample_A and sample_B to a single RMF file each for sample_A and sample_B. Hoping to have it finished and tested by the beginning of next week.

On Wed, Nov 20, 2019 at 1:22 AM shruthivis notifications@github.com wrote:

https://github.com/salilab/actin_tutorial/tree/master/modeling has run1.zip and run2.zip which presumably correspond to the full set of trajectories from modeling.

On Wed, Nov 20, 2019 at 11:59 AM Ignacia Echeverria < notifications@github.com> wrote:

Yes, this is true. The idea is to move the analysis away from arbitrary cutoffs and start looking into all sampled models in a probabilistic way. I still find selecting good scoring models useful for preliminary analysis while simulations are still running. For example, how well the good scoring models are satisfying the data and if the representation needs to be adjusted. Moving forward, I'm planning to incorporate everything, including what is in PMI_analysis and sampcon, into the PMI analysis module. I can add the new analysis protocol to the actin tutorial. Do we have a full set of trajectories? Where are they stored?

On Tue, Nov 19, 2019 at 6:34 PM shruthivis notifications@github.com wrote:

That part of the protocol (GoodScoringModelSelector.py) is superseded by @iecheverria https://github.com/iecheverria and @ichem001 https://github.com/ichem001 's new methods for selecting models for analysis. So not sure if it is worth investing a lot of time in revamping this script. Only the actin tutorial (and perhaps a couple of older application papers?) use this. Perhaps the actin tutorial should be updated to include the new analysis protocol?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/salilab/imp-sampcon/issues/9?email_source=notifications&email_token=ADBZGA477IAQZ2CEUOU5BKLQUSO5DA5CNFSM4JPLF2OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQPM7Q#issuecomment-555808382

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ADBZGAYFF2NYFPUWHT2DAD3QUSO5DANCNFSM4JPLF2OA

.

--

Ignacia Echeverria Postdoctoral Scholar Department of Bioengineering and Therapeutic Sciences University of California, San Francisco http://salilab.org/~ignacia

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/salilab/imp-sampcon/issues/9?email_source=notifications&email_token=AB7N634SAHKR3VJPH6XFTYTQUTKKZA5CNFSM4JPLF2OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQ36NQ#issuecomment-555859766 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AB7N632PUXIFFI23GHSG3WLQUTKKZANCNFSM4JPLF2OA

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/salilab/imp-sampcon/issues/9?email_source=notifications&email_token=ABXONQAMGV7RVNS5BWMOCDLQUT6WLA5CNFSM4JPLF2OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEERJW6I#issuecomment-555916153, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABXONQDG6ONRUPI23BS7JYDQUT6WLANCNFSM4JPLF2OA .

--

Daniel Saltzberg Post-doctoral Scholar University of California at San Francisco Lab of Andrej Sali (www.salilab.org)

T: 415.514.4258

Mailing Address: UCSF MC 2552, Mission Bay, Byers Hall 1700 4th Street, Suite 503B San Francisco, CA 94158-2330

saltzberg@salilab.org ds229@bu.edu

ichem001 commented 4 years ago

@saltzberg - Instead of writing one giant RMF file per sample - maybe we could write one small RMF and a big DCD file for each sample - and this will also make deposition to Zenodo almost automatic since we need DCD files at the end of the day - might kill two birds with one stone - We could either link both DCD files to the ensembles or concatenate the DCD file with catDCD from the VMD/NAMD group. what do you think?

benmwebb commented 4 years ago

Instead of writing one giant RMF file per sample - maybe we could write one small RMF and a big DCD file for each sample

This is essentially what happens internally anyway - everything is converted to a monstrous numpy array of coordinates, which is about as efficient as it can be. I don't much like DCD as a long-term solution since you lose all of the topology information and can only store coordinates. I'd rather overhaul RMF to make it more efficient at storing multiple conformations (on my lengthy list of things to fix).

saltzberg commented 4 years ago

@ichem001 The single large RMF that I am talking about are replacing the ./analysis/sample_A/tons_of_one_frame.rmf3s, not the final output.

Reading individual RMF files with rmf_slice is exceedingly slow...almost half of the total time for clustering. The PMI_analysis run_extract_models.py step can be changed to output two RMF files (sample_A and sample_B) for each cluster. These can be read into imp-sampcon an order of magnitude faster than individual RMFs for each model.