labsyspharm / mcmicro

Multiple-choice microscopy pipeline
https://mcmicro.org/
MIT License
98 stars 58 forks source link

ASHLAR/MCQUANT modules #534

Open MDHowe4 opened 5 months ago

MDHowe4 commented 5 months ago

Hello,

Our group is planning to use MCMICRO and Minerva analysis/gater as a workflow for multiplexed image analysis and had a couple issues and suggestions on these two modules. We have been able to get MCMICRO to work on our codex tiles, but it doesn't seem possible currently to integrate our ASHLAR step (which is currently being run outside MCMICRO) within MCMICRO. Our current ASHLAR command is as follows:

#!/bin/bash

# this script should be run inside the ashlar folder which contains two folders
# ashlar_input and ashlar_output
# the ashlar input has the regions in folders as reg001/t001
# the struture of the ashlar_input folder is generated by the python script  that 
# copies and renames the files so that they appear to come from the same cycle

for reg in $(ls ./ashlar_inputs)
do
    echo ${reg}
    docker run \
    -v './ashlar_inputs':/input \
    -v "./ashlar_outputs":/output \
    -it labsyspharm/ashlar:1.18.0b3 ashlar \
    --maximum-shift 0 \
    "filepattern|/input/${reg}/t001|pattern=${reg}_X{col:02}_Y{row:02}_t001_z001_c{channel}.tif|overlap=0.0|pixel_size=0.377454" \
    -o /output/${reg}.ome.tiff
done

The only function of ASHLAR we are using is the stitching step to produce single ome.tiff images that are compatible with downstream steps in MCMICRO, however, the images coming off the codex machine aren't bio-formats compatible and we have to preprocess the tiles into a structure that ASHLAR recognizes. This has worked well as we are able to produce an equivalent output to the Akoya processor for individual images, but since we are using the undocumented “filepattern” command we have used the separate ASHLAR container to run this step. It doesn't currently seem possible to recreate this within MCMICRO to parse our specific data formats. Any suggestions? It would be ideal to have as many steps handled by MCMICRO, but I understand dealing with all of these non-standard data formats is challenging and more options are probably planned in the nf-core pipeline.

MCQUANT currently only outputs mean_intensity and if specified, --intensity_props or --mask_props. This works perfectly when we also want the median_intensity outputs, but the only option is to concatenate this data to the right of mean_intensity within the CSV. This has poor compatibility in Minerva analysis/gater if you plan to gate on the median_intensity as Minerva_analysis/gater will default to the mean_intensity columns. Shuffling all of the channels isn't really feasible either with large numbers of channels. My suggestion would be to allow exact control of the output from MCQUANT for better compatibility. I suppose this could also be handled in a future update by Minerva analysis/gater. This is only a minor obstacle in our workflow that can be easily handled by post-processing, but could maybe be handled within MCMICRO as well.

ArtemSokolov commented 5 months ago

Thanks, @MDHowe4. I've been wanting to add support for ASHLAR's filepattern feature for quite a while, so I can work on that next. Can you tell me a little bit more your workflow? Your for loop produces multiple .ome.tiff files. Are these different regions of interest? How do you feed these files to MCMICRO? Do you put all of them into the same registration/ subfolder and do a single nextflow run, or do you create a separate "project" folder for each file and process them separately?

I'm not sure I follow the MCQUANT issue. Are you suggesting to make mean_intensity optional, or are you wanting more control over the order of columns in the output?

emmanuel-contreras commented 5 months ago

Hello, my name is Emmanuel and I am working with @MDHowe4 on creating a mutichannel immunofluorescence pipeline for Akoya/CODEX images. The issue we have run into is that newer Akoya CODEX software outputs qptiffs which combine all field of views into a single tiff file. This is great to save space but is not compatible with downstream analysis (including MCMICRO I believe) because we need to process each region separately. I was able to take in the output tiles from the CODEX machine using filepattern to recreate the OME.TIFF for each region that we need using ASHLAR but ran into issues with ASHLAR introducing a single pixel gaps between some of the tiles in certain cycles. (see image.sc forum below if interested)

I was able to overcome this by copying/renaming all the tiles so that ASHLAR recognizes them as a single cycle with n number of channels. I do this through a python script. This preprocessing is done for all the regions of an experiment.

Once all the tiles have been renamed, the bash script above iterates through each of the regions, runs ASHLAR for each region, producing a single ome.tiff for each region.

In our use case, it would be nice if MCMICRO would recognize the filepattern so that we could start our pipieline on the renamed tiles as opposed to having to stitch/run ashlar ourselves outside of MCMICRO.

Please let me know if anything is unclear.

https://forum.image.sc/t/ashlar-how-to-pass-multiple-images-to-be-stitched/49864/51?page=3

image

ArtemSokolov commented 5 months ago

Hi @emmanuel-contreras,

Thanks for providing additional context. My question was more on the output side. After you run ASHLAR, what do you do with the output .ome.tiff files to feed them to MCMICRO?

Do you structure them like this?

myproject/
  params.yml
  markers.csv
  registration/
    reg001.ome.tif
    reg002.ome.tif
    ...

or like this?

reg001/
  params.yml
  markers.csv
  registration/
    reg001.ome.tif

reg002/
  params.yml
  markers.csv
  registration/
    reg002.ome.tif

...
MDHowe4 commented 5 months ago

We have been structuring as example number 1 you gave. Sorry, was under the impression you meant on the ASHLAR side.

ArtemSokolov commented 5 months ago

Hi @MDHowe4 and @emmanuel-contreras,

I have a prototype solution in my own fork. Can you try running it on just one of your regions and let me know if it works?

Structure your data as follows:

reg001/
  params.yml
  markers.csv
  raw/
    reg001_X01_Y01_t001_z001_c001.tif
    reg001_X01_Y01_t001_z001_c002.tif
    ...
    reg001_X01_Y01_t001_z001_c010.tif

and add the following to your params.yml:

options:
  ashlar: --maximum-shift 0
    "filepattern|.|pattern=reg001_X{col:02}_Y{row:02}_t001_z001_c{channel}.tif|overlap=0.0|pixel_size=0.377454"

(NOTE that I replaced /input/${reg}/t001 with the current working directory .)

Use the following command to feed the above directory to my fork (fileseries branch):

nextflow run ArtemSokolov/mcmicro -r fileseries --in reg001/

If this works, I will merge the prototype into the main MCMICRO pipeline and work on supporting multiple images in the raw/ directory, which would allow you to place all your regions into a single project directory.

MDHowe4 commented 5 months ago

Thanks for all the help Artem,

I pulled and executed your branch pipeline as you described above and received this error:

Caused by:
  Process `registration:ashlar` terminated with an error exit status (1)

Command executed:

  ashlar   --maximum-shift 0 "filepattern|.|pattern=reg001_X{col:02}_Y{row:02}_t001_z001_c{channel}.tif|overlap=0.0|pixel_size=0.377454"  -o reg001.ome.tif

Command exit status:
  1

Command output:
  Stitching and registering input images
  Cycle 0:
      reading filepattern|.|pattern=reg001_X{col:02}_Y{row:02}_t001_z001_c{channel}.tif|overlap=0.0|pixel_size=0.377454

Command error:
  INFO:    Converting SIF file to temporary sandbox...
  WARNING: No tiles overlap, attempting alignment anyway.
  Traceback (most recent call last):
    File "/usr/local/bin/ashlar", line 8, in <module>
      sys.exit(main())
    File "/usr/local/lib/python3.10/dist-packages/ashlar/scripts/ashlar.py", line 212, in main
      return process_single(
    File "/usr/local/lib/python3.10/dist-packages/ashlar/scripts/ashlar.py", line 243, in process_single
      edge_aligner.run()
    File "/usr/local/lib/python3.10/dist-packages/ashlar/reg.py", line 484, in run
      self.compute_threshold()
    File "/usr/local/lib/python3.10/dist-packages/ashlar/reg.py", line 584, in compute_threshold
      _, errors[i] = utils.register(img1, img2, self.filter_sigma, upsample=1)
    File "/usr/local/lib/python3.10/dist-packages/ashlar/utils.py", line 23, in register
      shift = skimage.registration.phase_cross_correlation(
    File "/usr/local/lib/python3.10/dist-packages/skimage/registration/_phase_cross_correlation.py", line 225, in phase_cross_correlation
      raise ValueError("images must be same shape")
  ValueError: images must be same shape
  INFO:    Cleaning up image...

For posterity I did modify some of the inputs above to make it compatible with our cluster environment. I submitted this to our SLURM queue:

#!/bin/sh

#SBATCH --partition=cpu-short
#SBATCH -J=mcmicro_test
#SBATCH -o=mcmicro-%J.log
#SBATCH --time=2:00:00
#SBATCH --mem=64G
#SBATCH --mail-type=ALL
#SBATCH --mail-user=
#SBATCH --chdir /MCMICRO/

module purge
module load nextflow
module load apptainer

export NXF_APPTAINER_CACHEDIR="/apptainer/containers"

nextflow -C mforge_settings.config run ArtemSokolov/mcmicro -r fileseries --in reg001/

And used the custom config that I've been using previously:

// Execution environment for miscellaneous tasks
params.roadie = 'labsyspharm/roadie:2023-03-08'

report.overwrite     = true
apptainer.enabled    = true
apptainer.autoMounts = true
apptainer.runOptions = '-C --unsquash -H "$PWD" -B "$HOME/.keras":"$PWD/.keras"'
params.contPfx       = 'docker://'

process{
  executor = 'slurm'
  queue    = 'cpu-short'
  cpus     = 4
  time     = '6h'
  memory   = '64GB'
}

and the params.yml file:

workflow:
  start-at: registration
  segmentation: mesmer
options:
  ashlar: --maximum-shift 0
    "filepattern|.|pattern=reg001_X{col:02}_Y{row:02}_t001_z001_c{channel}.tif|overlap=0.0|pixel_size=0.377454"
  mesmer: --membrane-channel 67
  mcquant: --masks cell.tif --intensity_props intensity_median
MDHowe4 commented 5 months ago

This is apparently an error that @emmanuel-contreras had seen previously [https://github.com/labsyspharm/ashlar/issues/170] that was solved by bumping ashlar to version: 1.18.0b3.

ArtemSokolov commented 5 months ago

Thanks for checking, @MDHowe4. You can tell MCMICRO to use a different version of ASHLAR by adding the following at the end of your params.yml:

modules:
  registration:
    version: 1.18.0b3

Let me know if that works.

I'll ask @jmuhlich if we should make 1.18.0b3 the new default in MCMICRO.

MDHowe4 commented 5 months ago

I went ahead and just did that once I realized I could update the version directly within MCMICRO. I have bumped the version on my side up and used this updated params.yml file:

workflow:
  start-at: registration
  segmentation: mesmer
options:
  ashlar: --maximum-shift 0
    "filepattern|.|pattern=reg001_X{col:02}_Y{row:02}_t001_z001_c{channel}.tif|overlap=0.0|pixel_size=0.377454"
  mesmer: --membrane-channel 67
  mcquant: --masks cell.tif --intensity_props intensity_median
modules:
    registration:
      name: ashlar
      container: labsyspharm/ashlar
      version: 1.18.0b3

Which resulted in this error:

Jan-30 15:58:06.518 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=registration:ashlar; work-dir=MCMICRO/work/e1/01da44b098428df7690fd29c787c16
  error [java.lang.IllegalArgumentException]: No enum constant nextflow.processor.PublishDir.Mode.NULL
Jan-30 15:58:06.609 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'registration:ashlar'

Caused by:
  No enum constant nextflow.processor.PublishDir.Mode.NULL

java.lang.IllegalArgumentException: No enum constant nextflow.processor.PublishDir.Mode.NULL
    at java.base/java.lang.Enum.valueOf(Enum.java:240)
    at nextflow.processor.PublishDir$Mode.valueOf(PublishDir.groovy)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
    at groovy.lang.MetaClassImpl.invokeStaticMethod(MetaClassImpl.java:1519)
    at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1010)
    at org.codehaus.groovy.runtime.StringGroovyMethods.asType(StringGroovyMethods.java:206)
    at nextflow.extension.Bolts.asType(Bolts.groovy:447)
    at jdk.internal.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.codehaus.groovy.runtime.metaclass.ReflectionMetaMethod.invoke(ReflectionMetaMethod.java:54)
    at org.codehaus.groovy.runtime.metaclass.NewInstanceMetaMethod.invoke(NewInstanceMetaMethod.java:54)
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1258)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
    at org.codehaus.groovy.runtime.InvokerHelper.invokePojoMethod(InvokerHelper.java:1024)
    at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1015)
    at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodN(ScriptBytecodeAdapter.java:180)
    at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.asType(ScriptBytecodeAdapter.java:603)
    at nextflow.processor.PublishDir.setMode(PublishDir.groovy:139)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1258)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
    at org.codehaus.groovy.runtime.metaclass.MultipleSetterProperty.setProperty(MultipleSetterProperty.java:54)
    at groovy.lang.MetaClassImpl.setProperty(MetaClassImpl.java:2785)
    at groovy.lang.MetaClassImpl.setProperty(MetaClassImpl.java:3838)
    at groovy.lang.GroovyObject.setProperty(GroovyObject.java:61)
    at org.codehaus.groovy.runtime.InvokerHelper.setProperty(InvokerHelper.java:213)
    at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.setProperty(ScriptBytecodeAdapter.java:496)
    at nextflow.processor.PublishDir.create(PublishDir.groovy:180)
    at nextflow.processor.TaskConfig.getPublishDir(TaskConfig.groovy:366)
    at nextflow.processor.TaskProcessor.publishOutputs(TaskProcessor.groovy:1306)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:48)
    at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:189)
    at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:57)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:185)
    at nextflow.processor.TaskProcessor.finalizeTask0(TaskProcessor.groovy:2282)
    at jdk.internal.reflect.GeneratedMethodAccessor211.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:48)
    at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:189)
    at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:57)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:185)
    at nextflow.processor.TaskProcessor.finalizeTask(TaskProcessor.groovy:2253)
    at nextflow.processor.TaskPollingMonitor.checkTaskStatus(TaskPollingMonitor.groovy:619)
    at nextflow.processor.TaskPollingMonitor.checkAllTasks(TaskPollingMonitor.groovy:533)
    at nextflow.processor.TaskPollingMonitor.pollLoop(TaskPollingMonitor.groovy:408)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1258)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
    at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:1036)
    at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1019)
    at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:97)
    at nextflow.processor.TaskPollingMonitor$_start_closure2.doCall(TaskPollingMonitor.groovy:292)
    at nextflow.processor.TaskPollingMonitor$_start_closure2.call(TaskPollingMonitor.groovy)
    at groovy.lang.Closure.run(Closure.java:493)
    at java.base/java.lang.Thread.run(Thread.java:834)

However, within the work directory I was able to find a stitched reg001.ome.tif. Upon inspection this appears to be the same as produced by the container outside MCMICRO. I believe the command is therefore functioning as expected, but there is a likely unrelated error with the PublishDir directive.

ArtemSokolov commented 5 months ago

Hi @MDHowe4,

Yes, this is unrelated. We recently parameterized how the files are "published" from work directories to the project directory. The default behavior is to "copy" the output files , which is consistent with how the pipeline has always operated. However, we found that copying large files was often slow and unnecessary, so we allowed users to switch to the "link" (i.e., hard link) mode instead.

Because you are supplying your config file with -C, it causes Nextflow to ignore the existence of params.publish_dir_mode, which is why it crashes with the "I don't know what mode to use for publishDir" error. If you add

// The mode for transfering output between work and project directories
params.publish_dir_mode = 'copy'

to your mforge_settings.config, everything should work.

I'll go ahead and merge my fork into main and look into supporting multiple images/regions in raw/ next.

MDHowe4 commented 5 months ago

Hi @ArtemSokolov,

Sorry for the delayed reply. I added the parameter you suggested and everything ran smoothly once re-ran. We look forward to any updates on supporting multiple images at once.

ArtemSokolov commented 4 months ago

Hi @MDHowe4,

I added support for multiple images in the raw/ subdirectory: https://github.com/labsyspharm/mcmicro/pull/540

Can you please let me know if it works as expected on your end?

MDHowe4 commented 4 months ago

Thanks a ton @ArtemSokolov for adding that functionality. I pulled the updated main branch and it ran perfectly on an input directory containing just a single region (reg001). However, when a new test directory containing two input tile directories reg001 and reg002 it fails to complete due to failing to recognize the reg002 pattern in the filenames for the second directory as seen here in the case where we match reg001:

options:
  ashlar: --maximum-shift 0
    "filepattern|.|pattern=reg001_X{col:02}_Y{row:02}_t001_z001_c{channel}.tif|overlap=0.0|pixel_size=0.377454"

This isn't surprising obviously but I am unable to find a pattern that matches for regXXX name cases. I have tried reg001 --> reg{series:3} which did not work in the pattern matching. Any ideas for how to handle our specific case here?

ArtemSokolov commented 4 months ago

Hi @MDHowe4,

This is actually very straightforward to do at the MCMICRO level, now that the pipeline keeps track of multiple samples.

Can you try nextflow pulling the latest version and running the following pattern:

options:
  ashlar: --maximum-shift 0
    "filepattern|.|pattern={samplename}_X{col:02}_Y{row:02}_t001_z001_c{channel}.tif|overlap=0.0|pixel_size=0.377454"

It should now replace {samplename} with the name of the directory containing the images (reg001, reg002, etc.) before passing the expression to ASHLAR.

Let me know if it's still giving you issues.