Make the output of downstream modules Minerva-story ready

ciszew commented 11 months ago

Hello I was wondering if it would be possible to obtain short documentation on how to generate and use different MCMICRO output files in Minerva Author (and possibly Minerva analysis) . For example it is unclear to me how to generate files that could be used for data visualization (as described here: https://github.com/labsyspharm/minerva-story/wiki/How-to-make-a-Minerva-Story%3F#44-import-data-visualizations) Or how to generate additional mask files as described here https://github.com/labsyspharm/minerva-story/wiki/How-to-make-a-Minerva-Story%3F#45-import-cell-segmentation-masks can all this be done in MCMICRO directly?

ArtemSokolov commented 11 months ago

Hi @ciszew,

MCMICRO uses a variant of Minerva called Auto-Minerva, which generates a basic view of the stitched and registered image. You can turn this feature on by adding viz: true to the workflow section of your params.yml:

workflow:
  viz: true

This will generate a viz/ subdirectory, and you can view the resulting Auto-Minerva visualization by opening index.html in that subdirectory.

As mentioned above, this will only show the stitched and registered image. To introduce additional manual annotations, you will need to run Minerva Story software directly (https://github.com/labsyspharm/minerva-story). You should be able to just import the segmentation masks generated by MCMICRO inside Minerva Story. There are also some modules that can generate cell state clustering from single-cell features. They are relatively quick to run, so you can try all of them by adding:

workflow:
  stop-at: downstream
  downstream: [naivestates, scimap, fastpg, scanpy, flowsom]

ciszew commented 11 months ago

Hello Thank you so much for getting back to me and for making this pipeline avaiable to the community, it is a fantastic tool. I have been using auto generated minerva visualization within MCMICR, its is great and fast option to have a quick look at the data but i find full fledged minerva suit much more powerful and flexible in creating visually guided analysis/presentations and would like to explore its full capabilities. Specifically i was wondering if its possible and how to create "Mask (cellID, State).csv file using mcmicro (screenshot attached), without Mask.csv file only all_cells are avaiable as a mask. minerva mask input

I have also tried couple mcmicro runs with downstream modules: [naivestates, scimap, flowsom] but im not sure i fully understand how this modules work (specifically scimap) when launch within mcmicro, what type of different parameters are avaiable for fine tuning and setting options for this modules when run within mcmicro and more specifically if there is a possibility of configuring nextflow params file so the output integrates better with Minerva story/author (and possibly Minerva analysis, somrthing im just starting to explor) thus my request for documentation in my initial post.

jmuhlich commented 11 months ago

You can use any clustering or cell type calling module in the "downstream" section to define the "Mask.csv" file. The column names and precise file name do vary across the different modules, but I think all of them provide an output file with per-cell clustering. You should just need to rename the cell id and cluster label columns to exactly CellID and State.

ArtemSokolov commented 11 months ago

I'll tweak the modules this weekend to produce the correct column names.

ArtemSokolov commented 11 months ago

@ciszew If you take the output of one of the modules (e.g., fastpg) where it assigns cells to clusters and change the Cluster column name to State, are you able to load it in Minerva Author?

# Replace the column name Cluster with State
/workspace/exemplar-001/downstream/fastpg $ sed 's/Cluster/State/g' exemplar-001--mesmer_cell-cells.csv > exemplar-001--mesmer_cell-states.csv

# Confirm that the column has been renamed
/workspace/exemplar-001/downstream/fastpg $ head -n 5 exemplar-001--mesmer_cell-states.csv 
CellID,State,Method
1,0,FastPG
2,0,FastPG
3,1,FastPG
4,16,FastPG

ciszew commented 11 months ago

so I havent run fastpg but i used flowsom, scimap and naivestates, all without any options (just downstream: [flowsom, scimap, naivestates] in worflow in params file) so im guessing default settings (and im not sure what those settings are)

FLOWSOM: One file generated by flowsom has clusters derived in addition to cellID columns, changing "Cluster" to "States" works for minerva, this is great. Two additional files generated by flowsom are raw expression values (just pulled from quantifications csv file for selected markers used for clustering) and im guessing median normalized expression value??? for each markers in each derived cluster
Naivestates, generates just two csv files (cell-probs and cell-models) but doesnt derive clusters, it also generates normalized expression plots but im not sure why 3 different lines on each plot:
Scimap, generates bunch of data (very informative plots mostly) and single master csv file with cellID, raw expression, morphology information and 3 columns with clusters derived with 3 different clustering algorithms ( kmeans leiden phenograph). Formatting this file generates correct input for minerva masks.
I think that using different masks (base on quantification data and clustering methods) within minerva creates great tool that can be use to visually QC the whole pipeline. But i would also like to understand better how the clusters are generated to make sure we are using optimized settings.

ArtemSokolov commented 11 months ago

Hi @ciszew,

Some basic information about these clustering methods (what method it runs, what the parameters are, etc.) can be found at https://mcmicro.org/parameters/core.html#clustering

Naivestates will attempt to infer whether a given marker is expressed in each cell or not. The three lines in those plots are:

Black - univariate distribution of marker expression across all cells
Red - the estimated population of cells that expresses that marker
Blue - the estimated population of cells that doesn't express that marker

(So, for example, it was not able to distinguish the two populations for CD68, because it's not a bi-modal distribution.)

Each cell is then assigned a probability of whether it belongs to the "red" population. If you provide a two-column file mapping markers to cell states (example: https://github.com/labsyspharm/naivestates/blob/master/typemap.csv), it will also combine the probabilities of individual marker expression to assign a cell type / state to individual cells. To specify this file, you can use naivestates-model workflow parameter:

workflow:
  naivestates-model: marker-to-cellstate-map.csv

Lastly, MCMICRO only runs the clustering tools from Scimap, but Scimap also has a whole bunch of tools for additional analysis (https://scimap.xyz/). If you install that software directly, it should be compatible with the MCMICRO output. (@ajitjohnson can help if you run into any issues.)

ciszew commented 11 months ago

Thank you so much, this is exactly what i was looking for.

labsyspharm / mcmicro

Make the output of downstream modules Minerva-story ready #513