Closed ShihongWu closed 1 year ago
Hi @ShihongWu,
I wonder if Nextflow may be ignoring SINGULARITY_BIND
and not be passing your images
folder to the container correctly.
Can you try adding the following runOptions
to your custom.config
:
singularity {
enabled = true
autoMounts = true
cacheDir = '/gpfs3/well/immune-rep/users/tma392/mcmicro/images'
runOptions = '-C -H "$PWD" -B /gpfs3/well/immune-rep/users/tma392/mcmicro/images:/mounted_images'
}
Hi Artem @ArtemSokolov, thank you for your quick reply! Much appreciated! I just added the runOptions to my custom.config as you suggested. Then I ran the pipeline again. This time I had different error as listed below. And I also found the model MultiplexSegmentation could be extracted and put in the directory.
The error:
N E X T F L O W ~ version 23.04.2
Launching https://github.com/labsyspharm/mcmicro
[amazing_heisenberg] DSL2 - revision: 5eac7773d7 [master]
executor > local (2)
[- ] process > illumination -
[08/f4fd63] process > registration:ashlar [100%] 1 of 1 ✔
[- ] process > background:backsub -
[- ] process > dearray:coreograph -
[- ] process > dearray:roadie:runTask -
[- ] process > segmentation:roadie:runTask -
[5c/33c60d] process > segmentation:worker (mesmer-1) [ 0%] 0 of 1
[- ] process > segmentation:s3seg -
[- ] process > quantification:mcquant -
[- ] process > downstream:worker -
[- ] process > viz:autominerva -
ERROR ~ Error executing process > 'segmentation:worker (mesmer-1)'
Caused by:
Process segmentation:worker (mesmer-1)
terminated with an error exit status (1)
Command executed:
python /usr/src/app/run_app.py mesmer --squeeze --output-directory . --output-name cell.tif --nuclear-image exemplar-001.ome.tif
Command exit status: 1
Command output: (empty)
Command error:
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
2023-08-23 09:32:26.530562: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/.singularity.d/libs
2023-08-23 09:32:26.530609: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "/usr/src/app/run_app.py", line 60, in
Caused by:
Process segmentation:worker (mesmer-1)
terminated with an error exit status (1)
Command executed:
python /usr/src/app/run_app.py mesmer --squeeze --output-directory . --output-name cell.tif --nuclear-image exemplar-001.ome.tif
Command exit status: 1
Command output: (empty)
Command error:
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
2023-08-23 09:32:26.530562: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/.singularity.d/libs
2023-08-23 09:32:26.530609: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "/usr/src/app/run_app.py", line 60, in
Work dir: /gpfs3/well/immune-rep/users/tma392/mcmicro/images/work/5c/33c60db118ec05fb21f14859121453
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
-- Check '.nextflow.log' file for details
Before running the pipeline: [tma392@rescomp1 images]$ ls -lh total 5.6G -rw-r--r-- 1 tma392 immune-rep 93M May 20 2022 MultiplexSegmentation-9.tar.gz
After running the pipeline: [tma392@compe031 images]$ ls -lh total 5.6G drwxrwxr-x 4 tma392 immune-rep 4.0K May 20 2022 MultiplexSegmentation -rw-r--r-- 1 tma392 immune-rep 93M May 20 2022 MultiplexSegmentation-9.tar.gz
Thanks for your time and assistance!
I was wondering if it's because of the model_path or model_dir, so I decided to run the container interactively. Here is the content:
[tma392@compe031 images]$ singularity shell vanvalenlab-deepcell-applications-0.4.0.img INFO: Environment variable SINGULARITY_BIND is set, but APPTAINER_BIND is preferred Apptainer> cd /usr/local/lib/python3.8/dist-packages/deepcell/applications/ Apptainer> python Python 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.
import os import tarfile archive_path = '/mounted_images/MultiplexSegmentation-9.tar.gz' model_dir = os.path.splitext(os.path.basename(archive_path))[0] extraction_path = '/mounted_images' print("archive_path:", archive_path) archive_path: /mounted_images/MultiplexSegmentation-9.tar.gz print("model_dir:", model_dir) model_dir: MultiplexSegmentation-9.tar with tarfile.open(archive_path, "r:gz") as archive: ... archive.extractall(path=extraction_path) ... model_path = os.path.join(extraction_path, model_dir) print("model_path:", model_path) model_path: /mounted_images/MultiplexSegmentation-9.tar
Then I modified that part in mesmer.py to get the right model_dir: archive_path = '/mounted_images/MultiplexSegmentation-9.tar.gz' archive_name = os.path.basename(archive_path) model_dir = os.path.splitext(archive_name)[0].rsplit('-', 1)[0] extraction_path = '/mounted_images' with tarfile.open(archive_path, "r:gz") as archive: archive.extractall(path=extraction_path) model_path = os.path.join(extraction_path, model_dir)
Then I went back to rebuilt the image and ran it again. Problem solved! Thanks a lot!
@ArtemSokolov Thanks a lot for helping me! The problem has been solved!
Excellent! Great to hear it.
I am encountering an issue while running mesmer process using Nextflow with a Singularity/apptainer container. The mesmer process fails with an error exit status (1), and the command executed within the container produces an error. the error is listed below:
ERROR ~ Error executing process > 'segmentation:worker (mesmer-1)'
Caused by: Process
segmentation:worker (mesmer-1)
terminated with an error exit status (1)Command executed:
python /usr/src/app/run_app.py mesmer --squeeze --output-directory . --output-name cell.tif --nuclear-image exemplar-001.ome.ti
Command exit status: 1
Command output: (empty)
Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred 2023-08-22 20:28:21.290519: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/.singularity.d/libs 2023-08-22 20:28:21.290567: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "/usr/src/app/run_app.py", line 60, in
run_application(dict(ARGS._get_kwargs()))
File "/usr/src/app/deepcell_applications/app_runners.py", line 52, in run_application
app = dca.utils.get_app(arg_dict['app'])
File "/usr/src/app/deepcell_applications/utils.py", line 44, in get_app
return app_map[name]'class'
File "/usr/local/lib/python3.8/dist-packages/deepcell/applications/mesmer.py", line 223, in init
with tarfile.open(archive_path, "r:gz") as archive:
File "/usr/lib/python3.8/tarfile.py", line 1621, in open
return func(name, filemode, fileobj, **kwargs)
File "/usr/lib/python3.8/tarfile.py", line 1667, in gzopen
fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
File "/usr/lib/python3.8/gzip.py", line 173, in init
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/mounted_images/MultiplexSegmentation-9.tar.gz'
Work dir: /gpfs3/well/immune-rep/users/tma392/mcmicro/images/work/b5/232cd350517008765e9b4ef0c7115f
Tip: you can replicate the issue by changing to the process work dir and entering the command
bash .command.run
-- Check '.nextflow.log' file for details
The steps I have taken are listed below:
Edited the mesmer.py script to incorporate necessary changes. The reason for the modification is that our High-Performance Computing (HPC) cluster is not connected to the internet, so we need to manually download the model. The changes I made here is : def init(self, model=None):
Rebuilt the mesmer image using my_image.def which is this:
Use the base image
FROM vanvalenlab/deepcell-applications:0.4.0
Replace mesmer.py with the modified version
RUN rm /usr/local/lib/python3.8/dist-packages/deepcell/applications/mesmer.py COPY mesmer.py /usr/local/lib/python3.8/dist-packages/deepcell/applications/
Additional Information:
Environment:
Nextflow version: 23.04.2 Singularity version: 1.1.9-1.el7 Operating System: NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"
Requested Help: I'm seeking assistance in identifying the root cause of the issue and finding a solution to successfully run the mesmer segmentation process using the Singularity container in Nextflow, especially considering the manual model download and changes made to mesmer.py.