labsyspharm / mcmicro

Multiple-choice microscopy pipeline
https://mcmicro.org/
MIT License
105 stars 58 forks source link

Modules based on Tensorflow containers are failing on the M1 architecture #353

Open sailseem opened 2 years ago

sailseem commented 2 years ago

Dear developer, Sorry to bother you, really good pipeline for analyzing imaging data. Cause I am quite new to this, when I repeat the examples, I got the errors like this.

(base) yangfan@huntsman-ve703-06621 test % nextflow run labsyspharm/mcmicro --in exemplar-001 N E X T F L O W ~ version 21.10.6 Launching labsyspharm/mcmicro [mad_bhabha] - revision: e451820623 [master] executor > local (2) [- ] process > illumination - [4c/203cee] process > registration:ashlar [100%] 1 of 1 ✔ [- ] process > dearray:coreograph - [80/19e6c4] process > segmentation:worker (unmicst-1) [ 0%] 0 of 1 [- ] process > segmentation:s3seg - [- ] process > quantification:mcquant - [- ] process > cellstates:worker - Error executing process > 'segmentation:worker (unmicst-1)'

Caused by: Missing output file(s) *.tif expected by process segmentation:worker (unmicst-1) (note: input files are not included in the default matching set)

Command executed:

python /app/unmicstWrapper.py --stackOutput --outputPath . exemplar-001.ome.tif

Command exit status: 0

Command output:

WARNING! USING unmicst-solo AS DEFAULT. THIS MODEL HAS BEEN TRAINED ON MORE TISSUE TYPES. IF YOU WANT THE LEGACY MODEL, USE --tool unmicst-legacy

python /app/UnMicst1-5.py exemplar-001.ome.tif --channel 0 --outputPath . --mean -1 --std -1 --scalingFactor 1 --GPU -1 --outlier -1 --stackOutput

Command error: 8c3b70e39044: Pull complete 45d437916d57: Pull complete d8f1569ddae6: Pull complete 85386706b020: Download complete d6c0f989e873: Verifying Checksum d6c0f989e873: Download complete 85386706b020: Pull complete ee9b457b77d0: Pull complete 7a8e64f26211: Verifying Checksum 7a8e64f26211: Download complete c33b03e4dd22: Verifying Checksum c33b03e4dd22: Download complete bca93af797c1: Verifying Checksum bca93af797c1: Download complete 644140fd95a9: Verifying Checksum 644140fd95a9: Download complete e5da48aa9554: Verifying Checksum e5da48aa9554: Download complete ca68d98a90c4: Verifying Checksum ca68d98a90c4: Download complete f2f977dcaf33: Verifying Checksum f2f977dcaf33: Download complete 7b80ab25c2e3: Verifying Checksum 7b80ab25c2e3: Download complete d722f2d958c7: Verifying Checksum d722f2d958c7: Download complete 356d5de193f8: Verifying Checksum 356d5de193f8: Download complete 47f6c197be35: Verifying Checksum 47f6c197be35: Download complete bebfcc1316f7: Download complete bebfcc1316f7: Pull complete 644140fd95a9: Pull complete d6c0f989e873: Pull complete 7a8e64f26211: Pull complete executor > local (2) [- ] process > illumination - [4c/203cee] process > registration:ashlar [100%] 1 of 1 ✔ [- ] process > dearray:coreograph - [80/19e6c4] process > segmentation:worker (unmicst-1) [100%] 1 of 1, failed: 1 ✘ [- ] process > segmentation:s3seg - [- ] process > quantification:mcquant - [- ] process > cellstates:worker - Error executing process > 'segmentation:worker (unmicst-1)'

I am not sure it's about the computer's problem (M1 mac) or something else, could you help me with that? Thanks a lot.

ArtemSokolov commented 2 years ago

Hi @sailseem

It looks like the unmicst step failed to generate its output files. Would you be able to provide more information?

There should be a work/ subdirectory generated by Nextflow that contains additional logs. Based on the log you posted, the work directory for unmicst begins with work/80/19e6c4.... You will find .command.out and .command.err files in that directory. Can you post the contents of those files? These are the logs that were generated by UnMicst, which can help identify why the tool failed to produce output .tif files.

-Artem

sailseem commented 2 years ago

Thanks for such a prompt reply, well I did not see any other files rather than tiffs image

i think maybe the M1 docker's problem, which i have no idea how to fix it.

( WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested)

Thanks

ArtemSokolov commented 2 years ago

@sailseem What folder is that?

To me, those look like raw files that get provided as input to ASHLAR (which performs stitching and registration) and should NOT appear in your work/80/19e6c4.... Instead, what you should see is a single exemplar-001.ome.tif, which was generated by ASHLAR and is now ready to be segmented.

Here's what it should look like. After download the exemplar, you should have a raw/ and illumination/ directories in your exemplar-001 project directory:

$ tree exemplar-001
exemplar-001
├── illumination
│   ├── exemplar-001-cycle-01-dfp.tif
│   ├── exemplar-001-cycle-01-ffp.tif
│   ├── exemplar-001-cycle-02-dfp.tif
│   ├── exemplar-001-cycle-02-ffp.tif
│   ├── exemplar-001-cycle-03-dfp.tif
│   └── exemplar-001-cycle-03-ffp.tif
├── markers.csv
└── raw
    ├── exemplar-001-cycle-01.ome.tiff
    ├── exemplar-001-cycle-02.ome.tiff
    └── exemplar-001-cycle-03.ome.tiff

2 directories, 10 files

When running MCMICRO, the tags at the beginning of the line show which work directory corresponds to what step. For example, here's just ASHLAR and UnMicst:

$ nextflow run labsyspharm/mcmicro --in exemplar-001
N E X T F L O W  ~  version 21.10.3
Launching `labsyspharm/mcmicro` [nice_ride] - revision: e451820623 [master]
...
[8c/1f4461] process > registration:ashlar             [100%] 1 of 1 ✔
[20/fc59b9] process > segmentation:worker (unmicst-1) [100%] 1 of 1 ✔
...

On my machine, the work directories begin with 8c/1f4461 and 20/fc59b9. On your machine, it looks like they were beginning with 4c/203cee and 80/19e6c4. We can look inside those work directories to see what inputs and outputs were handled at each stage. Use -a to make sure that it shows all hidden files:

$ ls -la work/8c/1f4461f3e6dcab6feaae4f35ec8794/
total 417904
drwxrwxr-x 2 sokolov sokolov      4096 Feb 10 12:24 .
drwxrwxr-x 3 sokolov sokolov      4096 Feb 10 12:24 ..
-rw-rw-r-- 1 sokolov sokolov         0 Feb 10 12:24 .command.begin
-rw-rw-r-- 1 sokolov sokolov       219 Feb 10 12:24 .command.err
-rw-rw-r-- 1 sokolov sokolov      5049 Feb 10 12:24 .command.log
-rw-rw-r-- 1 sokolov sokolov      4830 Feb 10 12:24 .command.out
-rw-rw-r-- 1 sokolov sokolov      4573 Feb 10 12:24 .command.run
-rw-rw-r-- 1 sokolov sokolov       354 Feb 10 12:24 .command.sh
lrwxrwxrwx 1 sokolov sokolov        74 Feb 10 12:24 exemplar-001-cycle-01-dfp.tif -> /home/sokolov/test/exemplar-001/illumination/exemplar-001-cycle-01-dfp.tif
lrwxrwxrwx 1 sokolov sokolov        74 Feb 10 12:24 exemplar-001-cycle-01-ffp.tif -> /home/sokolov/test/exemplar-001/illumination/exemplar-001-cycle-01-ffp.tif
lrwxrwxrwx 1 sokolov sokolov        66 Feb 10 12:24 exemplar-001-cycle-01.ome.tiff -> /home/sokolov/test/exemplar-001/raw/exemplar-001-cycle-01.ome.tiff
lrwxrwxrwx 1 sokolov sokolov        74 Feb 10 12:24 exemplar-001-cycle-02-dfp.tif -> /home/sokolov/test/exemplar-001/illumination/exemplar-001-cycle-02-dfp.tif
lrwxrwxrwx 1 sokolov sokolov        74 Feb 10 12:24 exemplar-001-cycle-02-ffp.tif -> /home/sokolov/test/exemplar-001/illumination/exemplar-001-cycle-02-ffp.tif
lrwxrwxrwx 1 sokolov sokolov        66 Feb 10 12:24 exemplar-001-cycle-02.ome.tiff -> /home/sokolov/test/exemplar-001/raw/exemplar-001-cycle-02.ome.tiff
lrwxrwxrwx 1 sokolov sokolov        74 Feb 10 12:24 exemplar-001-cycle-03-dfp.tif -> /home/sokolov/test/exemplar-001/illumination/exemplar-001-cycle-03-dfp.tif
lrwxrwxrwx 1 sokolov sokolov        74 Feb 10 12:24 exemplar-001-cycle-03-ffp.tif -> /home/sokolov/test/exemplar-001/illumination/exemplar-001-cycle-03-ffp.tif
lrwxrwxrwx 1 sokolov sokolov        66 Feb 10 12:24 exemplar-001-cycle-03.ome.tiff -> /home/sokolov/test/exemplar-001/raw/exemplar-001-cycle-03.ome.tiff
-rw-r--r-- 1 root    root    427843658 Feb 10 12:24 exemplar-001.ome.tif
-rw-rw-r-- 1 sokolov sokolov         1 Feb 10 12:24 .exitcode

^This is the work directory for ASHLAR. Notice that the original files from exemplar-001/ are linked to in here, because they are used as inputs. Additionally, there is exemplar-001.ome.tif, which was generated by ASHLAR after it performed stitching and registration.

Similarly, we can look inside the work directory for UnMicst:

$ ls -la work/20/fc59b9325f90115ecfb415e4a943a6/
total 23112
drwxrwxr-x 3 sokolov sokolov     4096 Feb 10 12:27 .
drwxrwxr-x 3 sokolov sokolov     4096 Feb 10 12:24 ..
-rw-rw-r-- 1 sokolov sokolov        0 Feb 10 12:24 .command.begin
-rw-rw-r-- 1 sokolov sokolov      153 Feb 10 12:24 .command.err
-rw-rw-r-- 1 sokolov sokolov      580 Feb 10 12:27 .command.log
-rw-rw-r-- 1 sokolov sokolov      427 Feb 10 12:27 .command.out
-rw-rw-r-- 1 sokolov sokolov     3465 Feb 10 12:24 .command.run
-rw-rw-r-- 1 sokolov sokolov       97 Feb 10 12:24 .command.sh
lrwxrwxrwx 1 sokolov sokolov       78 Feb 10 12:24 exemplar-001.ome.tif -> /home/sokolov/test/work/8c/1f4461f3e6dcab6feaae4f35ec8794/exemplar-001.ome.tif
-rw-r--r-- 1 root    root    23620618 Feb 10 12:27 exemplar-001_Probabilities_1.tif
-rw-rw-r-- 1 sokolov sokolov        1 Feb 10 12:27 .exitcode
lrwxrwxrwx 1 sokolov sokolov       69 Feb 10 12:24 input.1 -> /home/sokolov/test/work/tmp/c9/f6bffc968a222c37f16d86f0f54d4d/input.1
drwxr-xr-x 2 root    root        4096 Feb 10 12:26 qc

Notice how exemplar-001.ome.tif from the previous step is being linked to here; it now serves as input to UnMicst. Your screenshot seems to show a different set of files. So, the first thing to double-check is that you are looking at the correct work directory.

Notice also the .command.out and .command.err files that I was asking you about. They contain the output logs generated by UnMicst. If everything works correctly, you should see the following in .command.out:

$ cat work/20/fc59b9325f90115ecfb415e4a943a6/.command.out
Using CPU
loading data
loading data
loading data
0.34
0.25
Model restored.
Using channel 1
Inference...
Inference...
Inference...

WARNING! USING unmicst-solo AS DEFAULT. THIS MODEL HAS BEEN TRAINED ON MORE TISSUE TYPES. IF YOU WANT THE LEGACY MODEL, USE --tool unmicst-legacy

python /app/UnMicst1-5.py  exemplar-001.ome.tif --channel 0 --outputPath . --mean -1 --std -1 --scalingFactor 1 --GPU -1 --outlier -1 --stackOutput

Can please you show me what your work directories (4c/203cee and 80/19e6c4) look like and what the content of .command.out is?

MargotCh commented 2 years ago

Hi @ArtemSokolov, first of all thank you for this detailed guide, it is very helpful! I can show you what the files you asked for look like, I'm getting the same error and strongly suspect that it comes from the same root as I've also just changed to a M1 mac and never had this problem before. These are .command.out and .command.err :

margotchazotte@dhcp027b 5ebb204a264f33467cecc3e4fbd9ce % cat .command.out

WARNING! USING unmicst-solo AS DEFAULT. THIS MODEL HAS BEEN TRAINED ON MORE TISSUE TYPES. IF YOU WANT THE LEGACY MODEL, USE --tool unmicst-legacy

python /app/UnMicst1-5.py  exemplar-001.ome.tif --channel 0 --outputPath . --mean -1 --std -1 --scalingFactor 1 --GPU -1 --outlier -1 --stackOutput 

margotchazotte@dhcp027b 5ebb204a264f33467cecc3e4fbd9ce % cat .command.err
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
2022-02-15 12:06:56.739129: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
qemu: uncaught target signal 6 (Aborted) - core dumped

 Aborted

This is what my unmicst work directory looks like:

@dhcp027b 5ebb204a264f33467cecc3e4fbd9ce % ls -la
total 24
drwxr-xr-x 11 margotchazotte staff  352 Feb 15 13:06 .
drwxr-xr-x  3 margotchazotte staff   96 Feb 15 13:06 ..
-rw-r--r--  1 margotchazotte staff    0 Feb 15 13:06 .command.begin
-rw-r--r--  1 margotchazotte staff  400 Feb 15 13:06 .command.err
-rw-r--r--  1 margotchazotte staff  697 Feb 15 13:06 .command.log
-rw-r--r--  1 margotchazotte staff  297 Feb 15 13:06 .command.out
-rw-r--r--  1 margotchazotte staff 3159 Feb 15 13:06 .command.run
-rw-r--r--  1 margotchazotte staff   97 Feb 15 13:06 .command.sh
-rw-r--r--  1 margotchazotte staff    1 Feb 15 13:06 .exitcode
lrwxr-xr-x  1 margotchazotte staff   81 Feb 15 13:06 exemplar-001.ome.tif -> /Users/margotchazotte/work/91/776d57b0d6db2dcbc5753f2358041d/exemplar-001.ome.tif
lrwxr-xr-x  1 margotchazotte staff   72 Feb 15 13:06 input.1 -> /Users/margotchazotte/work/tmp/ae/b30c3a306e5d57e3bc9e694ee47338/input.1

And this is the ashlar work directory:

@dhcp027b 776d57b0d6db2dcbc5753f2358041d % ls -la 
total 426024
drwxr-xr-x 19 margotchazotte staff       608 Feb 15 13:06 .
drwxr-xr-x  3 margotchazotte staff        96 Feb 15 13:01 ..
-rw-r--r--  1 margotchazotte staff         0 Feb 15 13:01 .command.begin
-rw-r--r--  1 margotchazotte staff       371 Feb 15 13:06 .command.err
-rw-r--r--  1 margotchazotte staff      5201 Feb 15 13:06 .command.log
-rw-r--r--  1 margotchazotte staff      4830 Feb 15 13:06 .command.out
-rw-r--r--  1 margotchazotte staff      4585 Feb 15 13:01 .command.run
-rw-r--r--  1 margotchazotte staff       354 Feb 15 13:01 .command.sh
-rw-r--r--  1 margotchazotte staff         1 Feb 15 13:06 .exitcode
lrwxr-xr-x  1 margotchazotte staff       110 Feb 15 13:01 exemplar-001-cycle-01-dfp.tif -> /Users/margotchazotte/Documents/uni/Master/SchapiroLab/exemplar-001/illumination/exemplar-001-cycle-01-dfp.tif
lrwxr-xr-x  1 margotchazotte staff       110 Feb 15 13:01 exemplar-001-cycle-01-ffp.tif -> /Users/margotchazotte/Documents/uni/Master/SchapiroLab/exemplar-001/illumination/exemplar-001-cycle-01-ffp.tif
lrwxr-xr-x  1 margotchazotte staff       102 Feb 15 13:01 exemplar-001-cycle-01.ome.tiff -> /Users/margotchazotte/Documents/uni/Master/SchapiroLab/exemplar-001/raw/exemplar-001-cycle-01.ome.tiff
lrwxr-xr-x  1 margotchazotte staff       110 Feb 15 13:01 exemplar-001-cycle-02-dfp.tif -> /Users/margotchazotte/Documents/uni/Master/SchapiroLab/exemplar-001/illumination/exemplar-001-cycle-02-dfp.tif
lrwxr-xr-x  1 margotchazotte staff       110 Feb 15 13:01 exemplar-001-cycle-02-ffp.tif -> /Users/margotchazotte/Documents/uni/Master/SchapiroLab/exemplar-001/illumination/exemplar-001-cycle-02-ffp.tif
lrwxr-xr-x  1 margotchazotte staff       102 Feb 15 13:01 exemplar-001-cycle-02.ome.tiff -> /Users/margotchazotte/Documents/uni/Master/SchapiroLab/exemplar-001/raw/exemplar-001-cycle-02.ome.tiff
lrwxr-xr-x  1 margotchazotte staff       110 Feb 15 13:01 exemplar-001-cycle-03-dfp.tif -> /Users/margotchazotte/Documents/uni/Master/SchapiroLab/exemplar-001/illumination/exemplar-001-cycle-03-dfp.tif
lrwxr-xr-x  1 margotchazotte staff       110 Feb 15 13:01 exemplar-001-cycle-03-ffp.tif -> /Users/margotchazotte/Documents/uni/Master/SchapiroLab/exemplar-001/illumination/exemplar-001-cycle-03-ffp.tif
lrwxr-xr-x  1 margotchazotte staff       102 Feb 15 13:01 exemplar-001-cycle-03.ome.tiff -> /Users/margotchazotte/Documents/uni/Master/SchapiroLab/exemplar-001/raw/exemplar-001-cycle-03.ome.tiff
-rw-r--r--  1 margotchazotte staff 427843658 Feb 15 13:06 exemplar-001.ome.tif

I hope this can be of help!

Best, Margot

ArtemSokolov commented 2 years ago

Thanks, @MargotCh. This is very helpful. We are now looking to reproduce the error on our side and design a possible workaround.

ArtemSokolov commented 2 years ago

Hi @MargotCh,

Just out of curiosity, have you been able to run any containers that are based on Tensorflow since you migrated to an M1 machine? If so, would you mind pointing us towards a container that works for on your new M1 mac?

No rush on this; just whenever you get a spare moment.

MargotCh commented 2 years ago

Hey @ArtemSokolov,

sorry for not getting back to you earlier. So far I've only tried Mesmer but that didn't work either. Do you have any suggestions on what else I should try?

ArtemSokolov commented 2 years ago

Hi @MargotCh,

We did some more digging into this. Unfortunately, none of the containers that are based on standard Tensorflow will work on the M1 architecture. We will need to build a parallel version of all of our containers that is tailored specifically to M1. We would then provide a separate configuration profile that would use those parallel builds (e.g., something like nextflow run labsyspharm/mcmicro --in exemplar-001 -profile M1) for Mac users.

The good news is that an M1-specific build of tensorflow seems to already exist (https://hub.docker.com/r/armswdev/tensorflow-arm-neoverse). The bad news is that GitHub Actions -- which we use to auto-build our containers on top of Tensorflow -- is currently unable to provision an M1 machine for a runtime environment. It is currently one of the most requested features (https://github.com/actions/virtual-environments/issues/2187), so we hope to see it made available in the near future. We are also looking around for possible alternatives to GitHub Actions, but haven't been able to identify a good one yet.

This is probably more details than you wanted. Long story short: we want to add support for M1 chips to MCMICRO, but the limited availability of M1-compatible cloud resources and the non-trivial amount of work to setup parallel builds means that the M1 support in MCMICRO may not happen very soon.

ArtemSokolov commented 2 years ago

One more thought: As a possible workaround, consider using --probability-maps ilastik for now. We haven't tested our ilastik container on an M1 machine, but it doesn't use a Tensorflow base, so it's possible that it may work without any issues (similar to your experience with ASHLAR). The full command may look like:

nextflow run labsyspharm/mcmicro --in exemplar-001 --probability-maps ilastik
LucaMarconato commented 1 year ago

I am also experiencing the same problem due to TensorFlow on a M1 Mac. I am running the pipeline on exemplar-002 and the problem arises from dearray:coreograph. Here is the execution output:

(ome) macbook@MBP-2021 data % nextflow run labsyspharm/mcmicro --in exemplar-002 -resume
N E X T F L O W  ~  version 22.10.0
Launching `https://github.com/labsyspharm/mcmicro` [sick_kare] DSL2 - revision: b9bd8cc3cb [master]
executor >  local (1)
[-        ] process > illumination                -
executor >  local (1)
[-        ] process > illumination                -
[ef/feaa0b] process > registration:ashlar         [100%] 1 of 1, cached: 1 ✔
[-        ] process > background:backsub          -
[39/e6d937] process > dearray:coreograph (1)      [100%] 1 of 1, failed: 1 ✘
[-        ] process > dearray:roadie:runTask      -
[-        ] process > segmentation:roadie:runTask -
[-        ] process > segmentation:worker         -
[-        ] process > segmentation:s3seg          -
[-        ] process > quantification:mcquant      -
[-        ] process > downstream:worker           -
[-        ] process > viz:autominerva             -
Error executing process > 'dearray:coreograph (1)'

Caused by:
  Process `dearray:coreograph (1)` terminated with an error exit status (134)

Command executed:

  python /app/UNetCoreograph.py --outputPath . --imagePath exemplar-002.ome.tif

Command exit status:
  134

Command output:
  (empty)

Command error:
  WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
  2023-04-02 17:31:32.523036: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
  qemu: uncaught target signal 6 (Aborted) - core dumped
  .command.sh: line 2:     8 Aborted                 python /app/UNetCoreograph.py --outputPath . --imagePath exemplar-002.ome.tif

Work dir:
  /Users/macbook/embl/projects/basel/spatialdata-sandbox/mcmicro2_io/data/work/39/e6d937f62058ee8673c8bc5fd9c3c1

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
FloWuenne commented 1 year ago

One possible solution could be to add this code to your .bashrc or whatever terminal config you use: export DOCKER_DEFAULT_PLATFORM=linux/amd64

This solved some of my M1 related container issues with docker!

LucaMarconato commented 1 year ago

Thanks @FloWuenne, unfortunately it didn't work for me.