haniffalab / webatlas-pipeline

A data pipeline built in Nextflow to process spatial and single-cell experiment data for visualisation in WebAtlas
MIT License
46 stars 10 forks source link

Error while running the pipeline using singularity #135

Closed ashishjain1988 closed 3 months ago

ashishjain1988 commented 3 months ago
          Hi @dannda,

While running the pipeline using the updated code, I am again getting error. It seems like the scanpy package is not installed correctly in the singularity image and it is trying to load it locally. Here is the error I got:

[ec/0cb7d3] Ful…les:route_file (h5ad, 230828_CZIgutage_seqPooled230801_reducedMeta.h5ad) [100%] 1 of 1, failed: 1 ✘[- ] Full_pipeline:Process_images:Generate_image - [- ] Full_pipeline:Process_images:image_to_zarr - [- ] Full_pipeline:Process_images:ome_zarr_metadata - [- ] Full_pipeline:Output_to_config:Build_config - Pulling Singularity image docker://haniffalab/webatlas-pipeline-build-config:0.5.1 [cache /lab-share/RC-Data-Science-e2/Public/Ashish/p262_scRNASeq_Thiagarajah/webatlas-pipeline/work/singularity/haniffalab-webatlas-pipeline-build-config-0.5.1.img]ERROR ~ Error executing process > 'Full_pipeline:Process_files:route_file (h5ad, 230828_CZIgutage_seqPooled230801_reducedMeta.h5ad)'

Caused by: Process Full_pipeline:Process_files:route_file (h5ad, 230828_CZIgutage_seqPooled230801_reducedMeta.h5ad) terminated with an error exit status (1)

Command executed:

router.py --file_type h5ad --path 230828_CZIgutage_seqPooled230801_reducedMeta.h5ad --stem p262-scRNAseq --args '{"compute_embeddings":false}'

Command exit status: 1

Command output: (empty)

Command error: Traceback (most recent call last): File "/lab-share/RC-Data-Science-e2/Public/Ashish/p262_scRNASeq_Thiagarajah/webatlas-pipeline/bin/router.py", line 12, in from process_h5ad import h5ad_to_zarr File "/lab-share/RC-Data-Science-e2/Public/Ashish/p262_scRNASeq_Thiagarajah/webatlas-pipeline/bin/process_h5ad.py", line 12, in import scanpy as sc File "/usr/local/lib/python3.10/site-packages/scanpy/init.py", line 14, in from . import tools as tl File "/usr/local/lib/python3.10/site-packages/scanpy/tools/init.py", line 1, in from ..preprocessing import pca File "/usr/local/lib/python3.10/site-packages/scanpy/preprocessing/init.py", line 1, in from ._recipes import recipe_zheng17, recipe_weinreb17, recipe_seurat File "/usr/local/lib/python3.10/site-packages/scanpy/preprocessing/_recipes.py", line 8, in from ._deprecated.highly_variable_genes import ( File "/usr/local/lib/python3.10/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py", line 11, in from .._utils import _get_mean_var File "/usr/local/lib/python3.10/site-packages/scanpy/preprocessing/_utils.py", line 46, in def sparse_mean_var_minor_axis(data, indices, major_len, minor_len, dtype): File "/usr/local/lib/python3.10/site-packages/numba/core/decorators.py", line 234, in wrapper disp.enable_caching() File "/usr/local/lib/python3.10/site-packages/numba/core/dispatcher.py", line 863, in enable_caching self._cache = FunctionCache(self.py_func) File "/usr/local/lib/python3.10/site-packages/numba/core/caching.py", line 601, in init self._impl = self._impl_class(py_func) File "/usr/local/lib/python3.10/site-packages/numba/core/caching.py", line 337, in init raise RuntimeError("cannot cache function %r: no locator available " RuntimeError: cannot cache function 'sparse_mean_var_minor_axis': no locator available for file '/usr/local/lib/python3.10/site-packages/scanpy/preprocessing/_utils.py'

Work dir: /lab-share/RC-Data-Science-e2/Public/Ashish/p262_scRNASeq_Thiagarajah/webatlas-pipeline/work/ec/0cb7d34d45d8a79589ac7c4ffcdb88

Regards, Ashish Jain

Originally posted by @ashishjain1988 in https://github.com/haniffalab/webatlas-pipeline/issues/132#issuecomment-2273727173

dannda commented 3 months ago

sorry, @ashishjain1988 , I was not able to reproduce this running the example visium workflow with the singularity profile

Pulling Singularity image docker://haniffalab/webatlas-pipeline-build-config:0.5.1 [cache /lab-share/RC-Data-Science-e2/Public/Ashish/p262_scRNASeq_Thiagarajah/webatlas-pipeline/work/singularity/haniffalab-webatlas-pipeline-build-config-0.5.1.img]

From your output it seems it is using release 0.5.1 . Could you test with the latest release 0.5.2 ? Might be worth also clearing cached data

ashishjain1988 commented 3 months ago

Hi @dannda,

I checked with the example visium workflow and I am still getting the same error (attached below). I am using nextflow 24.04.3, singularity 3.8.3, and the latest webatlas-pipeline release 0.5.2. Can you please let me know if I can make any other change to run the docker image?

[ch233592@compute-10-1 webatlas-pipeline]$ ./nextflow run main.nf -params-file CytAssist_FFPE_Human_Breast_Cancer.yaml -entry Full_pipeline -profile singularity

N E X T F L O W ~ version 24.04.3

Launching main.nf [maniac_waddington] DSL2 - revision: de5e3fc555

unknown recognition error type: groovyjarjarantlr4.v4.runtime.LexerNoViableAltException [- ] Full_pipeline:Process_files:route_file - [- ] Full_pipeline:Process_images:Generate_image - [- ] Full_pipeline:Process_images:image_to_zarr - [- ] Full_pipeline:Process_files:route_file [ 0%] 0 of 1 [- ] Full_pipeline:Process_images:Generate_image [ 0%] 0 of 1 [- ] Full_pipeline:Process_images:image_to_zarr [ 0%] 0 of 1executor > local (2) [- ] Full_pipeline:Process_files:route_file [ 0%] 0 of 1[b4/e4fee1] Ful…:Generate_image ([visium, breast-cancer], label, CytAssist_FFPE_Human_Breast_Cancer) [ 0%] 0 of 1executor > local (2) [- ] Full_pipeline:Process_files:route_file [ 0%] 0 of 1[b4/e4fee1] Ful…:Generate_image ([visium, breast-cancer], label, CytAssist_FFPE_Human_Breast_Cancer) [ 0%] 0 of 1[90/6b412c] Full_pipeline:Process_images:image_to_zarr (tissue_image.tif) [ 0%] 0 of 1[- ] Full_pipeline:Process_images:ome_zarr_metadata - [- ] Full_pipeline:Output_to_config:Build_config -Pulling Singularity image docker://haniffalab/webatlas-pipeline:0.5.2 [cache /lab-share/RC-Data-Science-e2/Public/Ashish/p262_scRNASeq_Thiagarajah/webatlas-pipeline/work/singularity/haniffalab-webatlas-pipeline-0.5.2.img] WARN: Singularity cache directory has not been defined -- Remote image will be stored in the path: /lab-share/RC-Data-Science-e2/Public/Ashish/p262_scRNASeq_Thiagarajah/webatlas-pipeline/work/singularity -- Use the environment variable NXF_SINGULARITY_CACHEDIR to specify a diexecutor > local (3) [- ] Full_pipeline:Process_files:route_file (spaceranger, CytAssist_FFPE_Human_Breast_Cancer) -[b4/e4fee1] Ful…:Generate_image ([visium, breast-cancer], label, CytAssist_FFPE_Human_Breast_Cancer) [100%] 1 of 1, failed: 1 ✘ [- ] Full_pipeline:Process_images:image_to_zarr (tissue_image.tif) - [- ] Full_pipeline:Process_images:ome_zarr_metadata - [- ] Full_pipeline:Output_to_config:Build_config -Pulling Singularity image docker://haniffalab/webatlas-pipeline:0.5.2 [cache /lab-share/RC-Data-Science-e2/Public/Ashish/p262_scRNASeq_Thiagarajah/webatlas-pipeline/work/singularity/haniffalab-webatlas-pipeline-0.5.2.img] WARN: Singularity cache directory has not been defined -- Remote image will be stored in the path: /lab-share/RC-Data-Science-e2/Public/Ashish/p262_scRNASeq_Thiagarajah/webatlas-pipeline/work/singularity -- Use the environment variable NXF_SINGULARITY_CACHEDIR to specify a different location ERROR ~ Error executing process > 'Full_pipeline:Process_images:Generate_image ([visium, breast-cancer], label, CytAssist_FFPE_Human_Breast_Cancer)'

Caused by: Process Full_pipeline:Process_images:Generate_image ([visium, breast-cancer], label, CytAssist_FFPE_Human_Breast_Cancer) terminated with an error exit status (1)

Command executed:

generate_image.py --stem visium-breast-cancer --img_type label --file_type visium --file_path CytAssist_FFPE_Human_Breast_Cancer --ref_img tissue_image.tif --args {}

Command exit status: 1

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred Traceback (most recent call last): File "/lab-share/RC-Data-Science-e2/Public/Ashish/p262_scRNASeq_Thiagarajah/webatlas-pipeline/bin/generate_image.py", line 12, in from process_spaceranger import visium_label File "/lab-share/RC-Data-Science-e2/Public/Ashish/p262_scRNASeq_Thiagarajah/webatlas-pipeline/bin/process_spaceranger.py", line 14, in import scanpy as sc File "/usr/local/lib/python3.10/site-packages/scanpy/init.py", line 14, in from . import tools as tl File "/usr/local/lib/python3.10/site-packages/scanpy/tools/init.py", line 1, in from ..preprocessing import pca File "/usr/local/lib/python3.10/site-packages/scanpy/preprocessing/init.py", line 1, in from ._recipes import recipe_zheng17, recipe_weinreb17, recipe_seurat File "/usr/local/lib/python3.10/site-packages/scanpy/preprocessing/_recipes.py", line 8, in from ._deprecated.highly_variable_genes import ( File "/usr/local/lib/python3.10/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py", line 11, in from .._utils import _get_mean_var File "/usr/local/lib/python3.10/site-packages/scanpy/preprocessing/_utils.py", line 46, in def sparse_mean_var_minor_axis(data, indices, major_len, minor_len, dtype): File "/usr/local/lib/python3.10/site-packages/numba/core/decorators.py", line 234, in wrapper disp.enable_caching() File "/usr/local/lib/python3.10/site-packages/numba/core/dispatcher.py", line 863, in enable_caching self._cache = FunctionCache(self.py_func) File "/usr/local/lib/python3.10/site-packages/numba/core/caching.py", line 601, in init self._impl = self._impl_class(py_func) File "/usr/local/lib/python3.10/site-packages/numba/core/caching.py", line 337, in init raise RuntimeError("cannot cache function %r: no locator available " RuntimeError: cannot cache function 'sparse_mean_var_minor_axis': no locator available for file '/usr/local/lib/python3.10/site-packages/scanpy/preprocessing/_utils.py'

Work dir: /lab-share/RC-Data-Science-e2/Public/Ashish/p262_scRNASeq_Thiagarajah/webatlas-pipeline/work/b4/e4fee14ea6199c20ee36bc4263d8e6

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details

dannda commented 3 months ago

I see, thanks for the update. I've managed to reproduce the error. It's being caused by numba's caching unable to write to the cache directory. It seems to me there's two ways of solving this

  1. Specifying a bind path for the singularity container to access the cache dir. You can set it up in the nextflow.config file adding
    singularity.runOptions = '--bind /path'

    however, I wouldn't know what this path would be for your system, unsure if it'd be /lab-share for example

  2. Specifying another path for numba's cache that the container can access and write to. You can set this up in the nextflow.config file adding
    env {
    NUMBA_CACHE_DIR = "/path/to/dir"
    }

cc @BioinfoTongLI @prete if they have any other input on this

BioinfoTongLI commented 3 months ago

Yea, that numba thing is new in the recent version. I've hard-coded to /tmp for someother projects. We might want to do it here as well.

ashishjain1988 commented 3 months ago

Hi @dannda, Thank you for the fix. It worked for me!