ml-struct-bio / cryodrgn

Neural networks for cryo-EM reconstruction
http://cryodrgn.cs.princeton.edu
GNU General Public License v3.0
292 stars 75 forks source link

Extraction of particles from kmeans clusters in star format #120

Open ag-deepesh opened 2 years ago

ag-deepesh commented 2 years ago

Hi @zhonge

  1. Could you please suggest how does one extract particles for a given kmeans cluster and then transform them in star format for traditional refinement? Note: In lasso tool, selection is done manually, therefore not precise. While running kmeans clustering algorithm, there must be a way to directly get indices for a given cluster. Kindly suggest.

  2. Could we view the latent space in the jupyter-notebook as per the kmeans cluster labels?

Thanks and Regards.

zhonge commented 2 years ago

Thanks for your question! There are a couple of options for selecting particles from the desired cluster:

  1. You can select the particles using the cryoDRGN_filtering.ipynb notebook from cryodrgn analyze-- there is a section of this notebook for selecting based on GMM or kmeans cluster labels. There is also a cell that generates a visualization of the selected cluster. Here is an example in the tutorial: https://www.notion.so/cryoDRGN-EMPIAR-10076-tutorial-c8728dcc88e744c8827447c3ff094d19#770194967d274239bfca4c52868198aa

  2. On the command line, there is a script for selecting clusters in the utils subdirectory of the repo: https://github.com/zhonge/cryodrgn/blob/master/utils/select_clusters.py

    For example, to select the particles in clusters 3, 5, and 7:

    (cryodrgn) $ python select_clusters.py /path/to/analyze/directory/kmeans20/labels.pkl --sel 3 5 7 -o selected_clusters.pkl

    If your training job has already been filtered (i.e. you provided a selection --ind to cryodrgn train_vae), you need to include the --parent-ind and --N-orig to get the indices into the original particle stack.

  3. Finally, there is a new tool cryodrgn analyze_landscape available in the latest beta version of cryodrgn (1.0.0-beta) for assigning and selecting classes from the cryodrgn results. I am still working on the documentation for landscape analysis, but a work-in-progress version is here: https://www.notion.so/cryodrgn-conformational-landscape-analysis-a5af129288d54d1aa95388bdac48235a.