labsyspharm / mcmicro-flowSOM

An MCMICRO module for clustering cell types using the flowSOM algorithm
0 stars 2 forks source link

markers.txt #5

Open Elena983 opened 11 months ago

Elena983 commented 11 months ago

Hi I create the text file markers.txt with markers in each row for the clusterization in each row. But it gives permanently the error 'couldn’t find the file'

The string in yml file options: flowsom: -m /Users/m/Desktop/flowsom/markers.txt -n 15

ArtemSokolov commented 11 months ago

Hi @Elena983,

It looks like you are running this through MCMICRO. Can you try changing your params.yml to the following:

workflow:
  flowsom-model: /Users/m/Desktop/flowsom/markers.txt
options:
  flowsom: -n 15
modules:
  downstream:
    -
      name: flowsom
      model: -m

Short explanation: Nextflow creates a separate work directory for each process and stages only the input files it knows about in that directory. There is nothing to tell Nextflow that /Users/m/Desktop/flowsom/markers.txt is another file and not just an arbitrary string, so that file never gets staged or exposed to the flowsom container.

We have a mechanism where users can provide their own models to ilastik (https://mcmicro.org/troubleshooting/faq.html#q-how-do-i-run-mcmicro-with-my-own-ilastik-model). We can make use of that same mechanism to specify additional files for other processes. In this case, we are saying that markers.txt is a model file for flowsom, and that flowsom should use its model files with -m. We can then remove the reference to this file from options.

MCMICRO should now properly stage this file in flowsom's work directory, but let me know if this doesn't work.

Elena983 commented 11 months ago

Many thanks for the answer, but there is an error

ERROR ~ Unrecognized parameter flowsom-model

If that possible run the downstream analysis as start-at: downstream

to avoid each time running the segmentation

ArtemSokolov commented 11 months ago

I just added flowsom-model to the list of valid parameters: https://github.com/labsyspharm/mcmicro/commit/b80ca4244ac55b99031290d21ce10f8ce6507e63

Can you pull the latest version with nextflow pull labsyspharm/mcmicro and try again?

Should be no problem to have start-at: downstream.

Elena983 commented 11 months ago

Could you please do the same model for the fastpg. How to write then both of them correctly in params?

and thing with downloading from start-at: downstream is blank image

Elena983 commented 11 months ago

Hi Artem run with the new params (model) the new error recieved image

ArtemSokolov commented 11 months ago

Hi @Elena983,

I fixed a small bug and added fastpg-model and scanpy-model to the list of recognizable parameters. (Please use nextflow pull labsyspharm/mcmicro to get the latest update.) I was able to run all three methods (fastpg, scanpy and flowsom) using the following params.yml with exemplar-001:

workflow:
  stop-at: downstream
  downstream: [fastpg, scanpy, flowsom]
  fastpg-model: /workspace/markers.txt
  scanpy-model: /workspace/markers.txt
  flowsom-model: /workspace/markers.txt
options:
  fastpg: -k 20
  scanpy: -k 20
  flowsom: -n 10
modules:
  downstream:
  - name: fastpg
    model: -m
  - name: scanpy
    model: -m
  - name: flowsom
    model: -m

I verified that it correctly passes the file I made (/workspace/markers.txt) to all modules using -m.

I can't seem to replicate your other issue of start-at: downstream giving an empty run. When I add start-at: downstream to my workflow: parameters, MCMICRO only reruns these three modules and nothing else.

Elena983 commented 11 months ago

Could you check start from modules with my params and my '.csv' in the 'quantification' folder, please markers.csv CRC--mesmer_cell.csv markers.txt

workflow:
  start-at: downstream 
  downstream: [flowsom, fastpg]
  fastpg-model: ./flowsom4/markers.txt
  flowsom-model: ./flowsom4/markers.txt
options:
  flowsom: -n 15
  fastpg: -k 25
modules:
  downstream:
  - name: fastpg
    model: -m
  - name: flowsom
    model: -m

Thank you

ArtemSokolov commented 11 months ago

@Elena983 I see what the issue is. By default, MCMICRO stops at quantification (https://github.com/labsyspharm/mcmicro/blob/master/config/defaults.yml#L3). Since downstream is the step after quantification, the pipeline decides that there is nothing to run.

Adding stop-at: downstream to params.yml will launch the jobs.

I also notice a small discrepancy: S100 appears lowercase (s100) in marker.csv and CRC--mesmer_cell.csv, which causes both methods to fail with "S100 not found". If you change S100 -> s100 in markers.txt, both modules run to completion.

Elena983 commented 11 months ago

Thank you. I will try In our lab, we have COMET so the files obtained are already stitched and aligned. But we still need to flat the image. How may I use the BaSic module within the pipeline?

ArtemSokolov commented 11 months ago

Hi Elena,

Unfortunately, BaSiC doesn't work with pre-stitched images, because it needs to see multiple tiles to accurately estimate the illumination profiles.

psenin-sanofi commented 11 months ago

Can we use the same "model file" trick with scimap? Thanks! (edit: I'd like to use a specific set of markers for automatically produced clustering figures, I do understand that I can achieve my goals later via API)

ArtemSokolov commented 11 months ago

Hi @psenin-sanofi,

I'm not sure that scimap has a way to limit the set of input markers. I'm looking through the code (https://github.com/labsyspharm/scimap/blob/master/scimap/cli/_scimap_mcmicro.py), and I'm not seeing a -m equivalent.

I'll ask @ajitjohnson about it, but maybe an alternative approach would be to add that functionality at the MCMICRO level, where the quantification table is clipped to the requested set of markers before getting passed to downstream modules. This would also simplify the above "parameter hacking".

psenin-sanofi commented 11 months ago

Thank you for looking into this. You're right -- this needs more involvement to get the trick done.

The csv to scimap https://scimap.xyz/Functions/pp/mcmicro_to_scimap/ function takes the markers list to be dropped (which would be an inversion of the "markers to be used" in other clustering methods), then this can be used in LOC 85 and 93 of that file you linked...

The rationale is that the default clustering results when using DAPI and a few other channels don't make much sense.