labsyspharm / mcmicro

Multiple-choice microscopy pipeline
https://mcmicro.org/
MIT License
104 stars 58 forks source link

Mesmer Process Fails in Nextflow with Singularity Container #515

Closed ShihongWu closed 1 year ago

ShihongWu commented 1 year ago

I am encountering an issue while running mesmer process using Nextflow with a Singularity/apptainer container. The mesmer process fails with an error exit status (1), and the command executed within the container produces an error. the error is listed below:

ERROR ~ Error executing process > 'segmentation:worker (mesmer-1)'

Caused by: Process segmentation:worker (mesmer-1) terminated with an error exit status (1)

Command executed:

python /usr/src/app/run_app.py mesmer --squeeze --output-directory . --output-name cell.tif --nuclear-image exemplar-001.ome.ti

Command exit status: 1

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred 2023-08-22 20:28:21.290519: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/.singularity.d/libs 2023-08-22 20:28:21.290567: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "/usr/src/app/run_app.py", line 60, in run_application(dict(ARGS._get_kwargs())) File "/usr/src/app/deepcell_applications/app_runners.py", line 52, in run_application app = dca.utils.get_app(arg_dict['app']) File "/usr/src/app/deepcell_applications/utils.py", line 44, in get_app return app_map[name]'class' File "/usr/local/lib/python3.8/dist-packages/deepcell/applications/mesmer.py", line 223, in init with tarfile.open(archive_path, "r:gz") as archive: File "/usr/lib/python3.8/tarfile.py", line 1621, in open return func(name, filemode, fileobj, **kwargs) File "/usr/lib/python3.8/tarfile.py", line 1667, in gzopen fileobj = GzipFile(name, mode + "b", compresslevel, fileobj) File "/usr/lib/python3.8/gzip.py", line 173, in init fileobj = self.myfileobj = builtins.open(filename, mode or 'rb') FileNotFoundError: [Errno 2] No such file or directory: '/mounted_images/MultiplexSegmentation-9.tar.gz'

Work dir: /gpfs3/well/immune-rep/users/tma392/mcmicro/images/work/b5/232cd350517008765e9b4ef0c7115f

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details

The steps I have taken are listed below:

  1. Edited the mesmer.py script to incorporate necessary changes. The reason for the modification is that our High-Performance Computing (HPC) cluster is not connected to the internet, so we need to manually download the model. The changes I made here is : def init(self, model=None):

    if model is None:
        archive_path = '/mounted_images/MultiplexSegmentation-9.tar.gz'
        model_dir = os.path.splitext(os.path.basename(archive_path))[0]
        extraction_path = '/mounted_images'  # Change this to your desired extraction directory
    
        if not os.path.exists(model_dir):
            # Extract the model manually
            with tarfile.open(archive_path, "r:gz") as archive:
                archive.extractall(path=extraction_path)
    
        model_path = os.path.join(extraction_path, model_dir)
        model = tf.keras.models.load_model(model_path)
  2. Rebuilt the mesmer image using my_image.def which is this:

    Use the base image

    FROM vanvalenlab/deepcell-applications:0.4.0

Replace mesmer.py with the modified version

RUN rm /usr/local/lib/python3.8/dist-packages/deepcell/applications/mesmer.py COPY mesmer.py /usr/local/lib/python3.8/dist-packages/deepcell/applications/

  1. Pushed the new image back to the hub with docker
  2. Pulled the image down with singularity on HPC
  3. Exported SINGULARITY_BIND environment variable with the host path to be mounted inside the container using this command: export SINGULARITY_BIND="/gpfs3/well/immune-rep/users/tma392/mcmicro/images:/mounted_images"
  4. Set SINGULARITY_USER_BIND_CONTROL=1 with this command export SINGULARITY_USER_BIND_CONTROL=1
  5. my custom.config is listed below: singularity { enabled = true autoMounts = true cacheDir = '/gpfs3/well/immune-rep/users/tma392/mcmicro/images' }
  6. I ran nextflow like this: nextflow run labsyspharm/mcmicro --in /gpfs3/well/immune-rep/users/tma392/mcmicro/exemplar-001 -profile singularity -c /gpfs3/well/immune-rep/users/tma392/mcmicro/custom.config
  7. still, I have the same error happen Error Details: The error message indicates that the segmentation process (mesmer-1) terminated with exit status 1. The command executed within the container outputs an error related to missing files, such as '/mounted_images/MultiplexSegmentation-9.tar.gz'.

Additional Information:

  1. after ran step 5, I used singularity shell vanvalenlab-deepcell-applications-0.4.0.img to get inside the container, I could find /mounted_image there

Environment:

Nextflow version: 23.04.2 Singularity version: 1.1.9-1.el7 Operating System: NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

Requested Help: I'm seeking assistance in identifying the root cause of the issue and finding a solution to successfully run the mesmer segmentation process using the Singularity container in Nextflow, especially considering the manual model download and changes made to mesmer.py.

ArtemSokolov commented 1 year ago

Hi @ShihongWu,

I wonder if Nextflow may be ignoring SINGULARITY_BIND and not be passing your images folder to the container correctly. Can you try adding the following runOptions to your custom.config:

singularity {
  enabled = true
  autoMounts = true
  cacheDir = '/gpfs3/well/immune-rep/users/tma392/mcmicro/images'
  runOptions = '-C -H "$PWD" -B /gpfs3/well/immune-rep/users/tma392/mcmicro/images:/mounted_images'
}
ShihongWu commented 1 year ago

Hi Artem @ArtemSokolov, thank you for your quick reply! Much appreciated! I just added the runOptions to my custom.config as you suggested. Then I ran the pipeline again. This time I had different error as listed below. And I also found the model MultiplexSegmentation could be extracted and put in the directory.

The error:

N E X T F L O W ~ version 23.04.2 Launching https://github.com/labsyspharm/mcmicro [amazing_heisenberg] DSL2 - revision: 5eac7773d7 [master] executor > local (2) [- ] process > illumination - [08/f4fd63] process > registration:ashlar [100%] 1 of 1 ✔ [- ] process > background:backsub - [- ] process > dearray:coreograph - [- ] process > dearray:roadie:runTask - [- ] process > segmentation:roadie:runTask - [5c/33c60d] process > segmentation:worker (mesmer-1) [ 0%] 0 of 1 [- ] process > segmentation:s3seg - [- ] process > quantification:mcquant - [- ] process > downstream:worker - [- ] process > viz:autominerva - ERROR ~ Error executing process > 'segmentation:worker (mesmer-1)'

Caused by: Process segmentation:worker (mesmer-1) terminated with an error exit status (1)

Command executed:

python /usr/src/app/run_app.py mesmer --squeeze --output-directory . --output-name cell.tif --nuclear-image exemplar-001.ome.tif

Command exit status: 1

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred 2023-08-23 09:32:26.530562: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/.singularity.d/libs 2023-08-23 09:32:26.530609: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "/usr/src/app/run_app.py", line 60, in run_application(dict(ARGS._get_kwargs())) File "/usr/src/app/deepcell_applications/app_runners.py", line 52, in run_application executor > local (2) [- ] process > illumination - [08/f4fd63] process > registration:ashlar [100%] 1 of 1 ✔ [- ] process > background:backsub - [- ] process > dearray:coreograph - [- ] process > dearray:roadie:runTask - [- ] process > segmentation:roadie:runTask - [5c/33c60d] process > segmentation:worker (mesmer-1) [100%] 1 of 1, failed: 1 ✘ [- ] process > segmentation:s3seg - [- ] process > quantification:mcquant - [- ] process > downstream:worker - [- ] process > viz:autominerva - ERROR ~ Error executing process > 'segmentation:worker (mesmer-1)'

Caused by: Process segmentation:worker (mesmer-1) terminated with an error exit status (1)

Command executed:

python /usr/src/app/run_app.py mesmer --squeeze --output-directory . --output-name cell.tif --nuclear-image exemplar-001.ome.tif

Command exit status: 1

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred 2023-08-23 09:32:26.530562: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/.singularity.d/libs 2023-08-23 09:32:26.530609: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "/usr/src/app/run_app.py", line 60, in run_application(dict(ARGS._get_kwargs())) File "/usr/src/app/deepcell_applications/app_runners.py", line 52, in run_application app = dca.utils.get_app(arg_dict['app']) File "/usr/src/app/deepcell_applications/utils.py", line 44, in get_app return app_map[name]'class' File "/usr/local/lib/python3.8/dist-packages/deepcell/applications/mesmer.py", line 227, in init model = tf.keras.models.load_model(model_path) File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/usr/local/lib/python3.8/dist-packages/keras/saving/save.py", line 204, in load_model raise IOError(f'No file or directory found at {filepath_str}') OSError: No file or directory found at /mounted_images/MultiplexSegmentation-9.tar

Work dir: /gpfs3/well/immune-rep/users/tma392/mcmicro/images/work/5c/33c60db118ec05fb21f14859121453

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details

Before running the pipeline: [tma392@rescomp1 images]$ ls -lh total 5.6G -rw-r--r-- 1 tma392 immune-rep 93M May 20 2022 MultiplexSegmentation-9.tar.gz

After running the pipeline: [tma392@compe031 images]$ ls -lh total 5.6G drwxrwxr-x 4 tma392 immune-rep 4.0K May 20 2022 MultiplexSegmentation -rw-r--r-- 1 tma392 immune-rep 93M May 20 2022 MultiplexSegmentation-9.tar.gz

Thanks for your time and assistance!

ShihongWu commented 1 year ago

I was wondering if it's because of the model_path or model_dir, so I decided to run the container interactively. Here is the content:

[tma392@compe031 images]$ singularity shell vanvalenlab-deepcell-applications-0.4.0.img INFO: Environment variable SINGULARITY_BIND is set, but APPTAINER_BIND is preferred Apptainer> cd /usr/local/lib/python3.8/dist-packages/deepcell/applications/ Apptainer> python Python 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import os import tarfile archive_path = '/mounted_images/MultiplexSegmentation-9.tar.gz' model_dir = os.path.splitext(os.path.basename(archive_path))[0] extraction_path = '/mounted_images' print("archive_path:", archive_path) archive_path: /mounted_images/MultiplexSegmentation-9.tar.gz print("model_dir:", model_dir) model_dir: MultiplexSegmentation-9.tar with tarfile.open(archive_path, "r:gz") as archive: ... archive.extractall(path=extraction_path) ... model_path = os.path.join(extraction_path, model_dir) print("model_path:", model_path) model_path: /mounted_images/MultiplexSegmentation-9.tar

Then I modified that part in mesmer.py to get the right model_dir: archive_path = '/mounted_images/MultiplexSegmentation-9.tar.gz' archive_name = os.path.basename(archive_path) model_dir = os.path.splitext(archive_name)[0].rsplit('-', 1)[0] extraction_path = '/mounted_images' with tarfile.open(archive_path, "r:gz") as archive: archive.extractall(path=extraction_path) model_path = os.path.join(extraction_path, model_dir)

Then I went back to rebuilt the image and ran it again. Problem solved! Thanks a lot!

ShihongWu commented 1 year ago

@ArtemSokolov Thanks a lot for helping me! The problem has been solved!

ArtemSokolov commented 1 year ago

Excellent! Great to hear it.