salilab / pmi

Python Modeling Interface
https://integrativemodeling.org/nightly/doc/ref/namespaceIMP_1_1pmi.html
12 stars 11 forks source link

analysis macros issues and how to solve them #91

Open Pellarin opened 10 years ago

Pellarin commented 10 years ago

The analysis macro is really a mess, I reckon it. Mostly the problems derive from the huge file handling, memory parsimony requirements, and parallel calculation.

One issue is that the information of the rmf is extracted twice, before and after the clustering, making it really slow.

What about the following scheme:

1) Extract the frames into rmf files, suitably stored as single frame rmfs in a directory that can be used in the future for other clustering runs that uses the same matrix

2) when reading back the coordinates, just open the saved rmfs, and not the original huge rmf files.

That will make the rmf reading considerably faster.

cgreenberg commented 10 years ago

Agreed! First get the best 500 frames and store in one file, then analysis can be done later.

Pellarin commented 10 years ago

Good! Also there is a memory leak somewhere when the clusters are saved.

cgreenberg commented 9 years ago

I added a function to collect the best models (PMI.io.input.save_best_models() ) but it doesn't read the RMFs in parallel. Probably we can wait til the next version to use that in the macro.