This project a very exciting development for reproducible and standardized single cell analysis. However, I'm having some difficulty getting GPU acceleration to work for some of tasks.
When running the pipeline with the GPU profile on AWS batch, all of the scanpy GPU processes (SCANPY_SCRUBLET, SCANPY_HVGS, SCANPY_HARMONY, SCANPY_LEIDEN, SCANPY_NEIGHBORS) fail with the error:
ImportError: /opt/conda/lib/python3.11/lib-dynload/_sqlite3.cpython-311-x86_64-linux-gnu.so: undefined symbol: sqlite3_deserialize
This looks like a problem with the docker container missing software, rather than the executor, but I don't have on-prem resources to test running locally (and if it were the container, then presumably others would have the same issue).
Other GPU tasks complete without a problem, after I modified my nextflow.config file to use a dedicated GPU compute envirnment (see attached).
Deleting the 'process_GPU' label from each of these processes allows the pipeline to run to completion (though without GPU acceleration for these processes).
Command used and terminal output
Command (run from parent directory of local clone):
nextflow run scdownstream/ -profile docker,gpu -config /tmp/nextflow.config --input scdownstream/assets/samplesheet.csv --outdir --outdir {private s3 bucket}
Terminal Output:
N E X T F L O W ~ version 24.04.4
Launching `scdownstream/main.nf` [friendly_shirley] DSL2 - revision: 456139e594
------------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/scdownstream 0.0.1dev
------------------------------------------------------
Input/output options
input : scdownstream/assets/samplesheet.csv
outdir : s3://katlas/nf_scdownstream_outs/test_run
Institutional config options
config_profile_description: AWSBATCH Cloud Profile
config_profile_contact : Alexander Peltzer (@apeltzer)
config_profile_url : https://aws.amazon.com/batch/
Core Nextflow options
runName : friendly_shirley
containerEngine : docker
launchDir : /home/robert/repos
workDir : /ktmp/nextflow-work/scrnaseq_work
projectDir : /home/robert/repos/scdownstream
userName : robert
profile : docker,gpu
configFiles :
!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
https://github.com/nf-core/scdownstream/blob/master/CITATIONS.md
executor > awsbatch (18)
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_READH5 -
[b1/44af75] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READRDS (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READCSV -
[12/b6da9a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_UNIFY (SAMN14430801) [100%] 4 of 4 ✔
[81/633761] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_UNFILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:CELLBENDER_REMOVEBACKGROUND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:ADATA_BARCODES -
[3a/e9f636] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_FILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[a8/06eb8a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_RAW (SAMN14430801) [100%] 2 of 2 ✔
[e2/1b9140] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX (SAMN14430801) [100%] 2 of 2 ✔
[3c/f9170e] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_FILTER (SAMN14430801) [ 50%] 1 of 2
[00/31bc89] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_THRESHOLDED_SIZE (SAMN14430799) [100%] 1 of 1
[29/823820] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799) [ 0%] 0 of 1
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:DOUBLET_REMOVAL -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_DEDOUBLETED_SIZE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_FILTERED -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:COLLECT_SIZES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_MERGE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_UPSETGENES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCANPY_HVGS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCVITOOLS_SCVI -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_NEIGHBORS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_UMAP -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_LEIDEN -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_PAGA -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_RANKGENESGROUPS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_EXTEND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_TORDS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:MULTIQC -
ERROR ~ Error executing process > 'NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799)'
Caused by:
Essential container in task exited
Command executed [/home/robert/repos/scdownstream/./workflows/../subworkflows/local/./../../modules/local/scanpy/scrublet/templates/scrublet.py]:
#!/usr/bin/env python3
import scanpy as sc
import platform
from threadpoolctl import threadpool_limits
threadpool_limits(int("6"))
sc.settings.n_jobs = int("6")
def format_yaml_like(data: dict, indent: int = 0) -> str:
"""Formats a dictionary to a YAML-like string.
Args:
data (dict): The dictionary to format.
indent (int): The current indentation level.
Returns:
str: A string formatted as YAML.
"""
yaml_str = ""
for key, value in data.items():
spaces = " " * indent
if isinstance(value, dict):
yaml_str += f"{spaces}{key}:\n{format_yaml_like(value, indent + 1)}"
else:
yaml_str += f"{spaces}{key}: {value}\n"
return yaml_str
adata = sc.read_h5ad("SAMN14430799_filtered.h5ad")
prefix = "SAMN14430799_scrublet"
use_gpu = "true" == "true"
if use_gpu:
import rapids_singlecell as rsc
import rmm
from rmm.allocators.cupy import rmm_cupy_allocator
import cupy as cp
rmm.reinitialize(
managed_memory=True,
pool_allocator=False,
)
cp.cuda.set_allocator(rmm_cupy_allocator)
rsc.get.anndata_to_GPU(adata)
executor > awsbatch (18)
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_READH5 -
[b1/44af75] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READRDS (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READCSV -
[12/b6da9a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_UNIFY (SAMN14430801) [100%] 4 of 4 ✔
[81/633761] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_UNFILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:CELLBENDER_REMOVEBACKGROUND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:ADATA_BARCODES -
[3a/e9f636] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_FILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[a8/06eb8a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_RAW (SAMN14430801) [100%] 2 of 2 ✔
[e2/1b9140] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX (SAMN14430801) [100%] 2 of 2 ✔
[3c/f9170e] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_FILTER (SAMN14430801) [ 50%] 1 of 2
[00/31bc89] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_THRESHOLDED_SIZE (SAMN14430799) [100%] 1 of 1
[29/823820] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799) [100%] 1 of 1, failed: 1
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:DOUBLET_REMOVAL -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_DEDOUBLETED_SIZE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_FILTERED -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:COLLECT_SIZES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_MERGE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_UPSETGENES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCANPY_HVGS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCVITOOLS_SCVI -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_NEIGHBORS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_UMAP -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_LEIDEN -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_PAGA -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_RANKGENESGROUPS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_EXTEND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_TORDS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:MULTIQC -
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Error executing process > 'NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799)'
Caused by:
Essential container in task exited
Command executed [/home/robert/repos/scdownstream/./workflows/../subworkflows/local/./../../modules/local/scanpy/scrublet/templates/scrublet.py]:
#!/usr/bin/env python3
import scanpy as sc
import platform
from threadpoolctl import threadpool_limits
threadpool_limits(int("6"))
sc.settings.n_jobs = int("6")
def format_yaml_like(data: dict, indent: int = 0) -> str:
"""Formats a dictionary to a YAML-like string.
Args:
data (dict): The dictionary to format.
indent (int): The current indentation level.
Returns:
str: A string formatted as YAML.
"""
yaml_str = ""
for key, value in data.items():
spaces = " " * indent
if isinstance(value, dict):
yaml_str += f"{spaces}{key}:\n{format_yaml_like(value, indent + 1)}"
else:
yaml_str += f"{spaces}{key}: {value}\n"
return yaml_str
adata = sc.read_h5ad("SAMN14430799_filtered.h5ad")
prefix = "SAMN14430799_scrublet"
use_gpu = "true" == "true"
if use_gpu:
import rapids_singlecell as rsc
import rmm
from rmm.allocators.cupy import rmm_cupy_allocator
import cupy as cp
rmm.reinitialize(
managed_memory=True,
pool_allocator=False,
)
cp.cuda.set_allocator(rmm_cupy_allocator)
rsc.get.anndata_to_GPU(adata)
executor > awsbatch (18)
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_READH5 -
[b1/44af75] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READRDS (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READCSV -
[12/b6da9a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_UNIFY (SAMN14430801) [100%] 4 of 4 ✔
[81/633761] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_UNFILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:CELLBENDER_REMOVEBACKGROUND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:ADATA_BARCODES -
[3a/e9f636] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_FILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[a8/06eb8a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_RAW (SAMN14430801) [100%] 2 of 2 ✔
[e2/1b9140] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX (SAMN14430801) [100%] 2 of 2 ✔
[3c/f9170e] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_FILTER (SAMN14430801) [ 50%] 1 of 2
[00/31bc89] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_THRESHOLDED_SIZE (SAMN14430799) [100%] 1 of 1
[29/823820] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799) [100%] 1 of 1, failed: 1
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:DOUBLET_REMOVAL -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_DEDOUBLETED_SIZE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_FILTERED -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:COLLECT_SIZES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_MERGE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_UPSETGENES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCANPY_HVGS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCVITOOLS_SCVI -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_NEIGHBORS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_UMAP -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_LEIDEN -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_PAGA -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_RANKGENESGROUPS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_EXTEND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_TORDS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:MULTIQC -
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Error executing process > 'NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799)'
Caused by:
Essential container in task exited
Command executed [/home/robert/repos/scdownstream/./workflows/../subworkflows/local/./../../modules/local/scanpy/scrublet/templates/scrublet.py]:
#!/usr/bin/env python3
import scanpy as sc
import platform
from threadpoolctl import threadpool_limits
threadpool_limits(int("6"))
sc.settings.n_jobs = int("6")
def format_yaml_like(data: dict, indent: int = 0) -> str:
"""Formats a dictionary to a YAML-like string.
Args:
data (dict): The dictionary to format.
indent (int): The current indentation level.
Returns:
str: A string formatted as YAML.
"""
yaml_str = ""
for key, value in data.items():
spaces = " " * indent
if isinstance(value, dict):
yaml_str += f"{spaces}{key}:\n{format_yaml_like(value, indent + 1)}"
else:
yaml_str += f"{spaces}{key}: {value}\n"
return yaml_str
adata = sc.read_h5ad("SAMN14430799_filtered.h5ad")
prefix = "SAMN14430799_scrublet"
use_gpu = "true" == "true"
if use_gpu:
import rapids_singlecell as rsc
import rmm
from rmm.allocators.cupy import rmm_cupy_allocator
import cupy as cp
rmm.reinitialize(
managed_memory=True,
pool_allocator=False,
)
cp.cuda.set_allocator(rmm_cupy_allocator)
rsc.get.anndata_to_GPU(adata)
rsc.pp.scrublet(adata, batch_key="batch")
rsc.get.anndata_to_CPU(adata)
else:
executor > awsbatch (18)
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_READH5 -
[b1/44af75] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READRDS (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READCSV -
[12/b6da9a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_UNIFY (SAMN14430801) [100%] 4 of 4 ✔
[81/633761] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_UNFILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:CELLBENDER_REMOVEBACKGROUND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:ADATA_BARCODES -
[3a/e9f636] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_FILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[a8/06eb8a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_RAW (SAMN14430801) [100%] 2 of 2 ✔
[e2/1b9140] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX (SAMN14430801) [100%] 2 of 2 ✔
[3c/f9170e] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_FILTER (SAMN14430801) [ 50%] 1 of 2
[00/31bc89] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_THRESHOLDED_SIZE (SAMN14430799) [100%] 1 of 1
[29/823820] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799) [100%] 1 of 1, failed: 1
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:DOUBLET_REMOVAL -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_DEDOUBLETED_SIZE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_FILTERED -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:COLLECT_SIZES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_MERGE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_UPSETGENES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCANPY_HVGS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCVITOOLS_SCVI -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_NEIGHBORS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_UMAP -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_LEIDEN -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_PAGA -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_RANKGENESGROUPS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_EXTEND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_TORDS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:MULTIQC -
ERROR ~ Error executing process > 'NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799)'
Caused by:
Essential container in task exited
Command executed [/home/robert/repos/scdownstream/./workflows/../subworkflows/local/./../../modules/local/scanpy/scrublet/templates/scrublet.py]:
#!/usr/bin/env python3
import scanpy as sc
import platform
from threadpoolctl import threadpool_limits
threadpool_limits(int("6"))
sc.settings.n_jobs = int("6")
def format_yaml_like(data: dict, indent: int = 0) -> str:
"""Formats a dictionary to a YAML-like string.
Args:
data (dict): The dictionary to format.
indent (int): The current indentation level.
Returns:
str: A string formatted as YAML.
"""
yaml_str = ""
for key, value in data.items():
spaces = " " * indent
if isinstance(value, dict):
yaml_str += f"{spaces}{key}:\n{format_yaml_like(value, indent + 1)}"
else:
yaml_str += f"{spaces}{key}: {value}\n"
return yaml_str
adata = sc.read_h5ad("SAMN14430799_filtered.h5ad")
prefix = "SAMN14430799_scrublet"
use_gpu = "true" == "true"
if use_gpu:
import rapids_singlecell as rsc
import rmm
from rmm.allocators.cupy import rmm_cupy_allocator
import cupy as cp
rmm.reinitialize(
managed_memory=True,
pool_allocator=False,
)
cp.cuda.set_allocator(rmm_cupy_allocator)
rsc.get.anndata_to_GPU(adata)
rsc.pp.scrublet(adata, batch_key="batch")
rsc.get.anndata_to_CPU(adata)
else:
executor > awsbatch (18)
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_READH5 -
[b1/44af75] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READRDS (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READCSV -
[12/b6da9a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_UNIFY (SAMN14430801) [100%] 4 of 4 ✔
[81/633761] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_UNFILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:CELLBENDER_REMOVEBACKGROUND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:ADATA_BARCODES -
[3a/e9f636] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_FILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[a8/06eb8a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_RAW (SAMN14430801) [100%] 2 of 2 ✔
[e2/1b9140] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX (SAMN14430801) [100%] 2 of 2 ✔
[3c/f9170e] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_FILTER (SAMN14430801) [ 50%] 1 of 2 ✔
[00/31bc89] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_THRESHOLDED_SIZE (SAMN14430799) [100%] 1 of 1
[29/823820] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799) [100%] 1 of 1, failed: 1
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:DOUBLET_REMOVAL -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_DEDOUBLETED_SIZE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_FILTERED -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:COLLECT_SIZES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_MERGE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_UPSETGENES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCANPY_HVGS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCVITOOLS_SCVI -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_NEIGHBORS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_UMAP -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_LEIDEN -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_PAGA -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_RANKGENESGROUPS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_EXTEND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_TORDS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:MULTIQC -
ERROR ~ Error executing process > 'NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799)'
Caused by:
Essential container in task exited
Command executed [/home/robert/repos/scdownstream/./workflows/../subworkflows/local/./../../modules/local/scanpy/scrublet/templates/scrublet.py]:
#!/usr/bin/env python3
import scanpy as sc
import platform
from threadpoolctl import threadpool_limits
threadpool_limits(int("6"))
sc.settings.n_jobs = int("6")
def format_yaml_like(data: dict, indent: int = 0) -> str:
"""Formats a dictionary to a YAML-like string.
Args:
data (dict): The dictionary to format.
indent (int): The current indentation level.
Returns:
str: A string formatted as YAML.
"""
yaml_str = ""
for key, value in data.items():
spaces = " " * indent
if isinstance(value, dict):
yaml_str += f"{spaces}{key}:\n{format_yaml_like(value, indent + 1)}"
else:
yaml_str += f"{spaces}{key}: {value}\n"
return yaml_str
adata = sc.read_h5ad("SAMN14430799_filtered.h5ad")
prefix = "SAMN14430799_scrublet"
use_gpu = "true" == "true"
if use_gpu:
import rapids_singlecell as rsc
import rmm
from rmm.allocators.cupy import rmm_cupy_allocator
import cupy as cp
rmm.reinitialize(
managed_memory=True,
pool_allocator=False,
)
cp.cuda.set_allocator(rmm_cupy_allocator)
rsc.get.anndata_to_GPU(adata)
rsc.pp.scrublet(adata, batch_key="batch")
rsc.get.anndata_to_CPU(adata)
else:
sc.pp.scrublet(adata, batch_key="batch")
executor > awsbatch (18)
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_READH5 -
[b1/44af75] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READRDS (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READCSV -
[12/b6da9a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_UNIFY (SAMN14430801) [100%] 4 of 4 ✔
[81/633761] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_UNFILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:CELLBENDER_REMOVEBACKGROUND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:ADATA_BARCODES -
[3a/e9f636] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_FILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[a8/06eb8a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_RAW (SAMN14430801) [100%] 2 of 2 ✔
[e2/1b9140] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX (SAMN14430801) [100%] 2 of 2 ✔
[3c/f9170e] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_FILTER (SAMN14430801) [100%] 2 of 2 ✔
[00/31bc89] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_THRESHOLDED_SIZE (SAMN14430799) [100%] 1 of 1
[29/823820] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799) [100%] 1 of 1, failed: 1
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:DOUBLET_REMOVAL -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_DEDOUBLETED_SIZE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_FILTERED -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:COLLECT_SIZES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_MERGE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_UPSETGENES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCANPY_HVGS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCVITOOLS_SCVI -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_NEIGHBORS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_UMAP -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_LEIDEN -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_PAGA -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_RANKGENESGROUPS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_EXTEND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_TORDS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:MULTIQC -
ERROR ~ Error executing process > 'NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799)'
Caused by:
Essential container in task exited
Command executed [/home/robert/repos/scdownstream/./workflows/../subworkflows/local/./../../modules/local/scanpy/scrublet/templates/scrublet.py]:
#!/usr/bin/env python3
import scanpy as sc
import platform
from threadpoolctl import threadpool_limits
threadpool_limits(int("6"))
sc.settings.n_jobs = int("6")
def format_yaml_like(data: dict, indent: int = 0) -> str:
"""Formats a dictionary to a YAML-like string.
Args:
data (dict): The dictionary to format.
indent (int): The current indentation level.
Returns:
str: A string formatted as YAML.
"""
yaml_str = ""
for key, value in data.items():
spaces = " " * indent
if isinstance(value, dict):
yaml_str += f"{spaces}{key}:\n{format_yaml_like(value, indent + 1)}"
else:
yaml_str += f"{spaces}{key}: {value}\n"
return yaml_str
adata = sc.read_h5ad("SAMN14430799_filtered.h5ad")
prefix = "SAMN14430799_scrublet"
use_gpu = "true" == "true"
if use_gpu:
import rapids_singlecell as rsc
import rmm
from rmm.allocators.cupy import rmm_cupy_allocator
import cupy as cp
rmm.reinitialize(
managed_memory=True,
pool_allocator=False,
)
cp.cuda.set_allocator(rmm_cupy_allocator)
rsc.get.anndata_to_GPU(adata)
rsc.pp.scrublet(adata, batch_key="batch")
rsc.get.anndata_to_CPU(adata)
else:
executor > awsbatch (18)
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_READH5 -
[b1/44af75] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READRDS (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READCSV -
[12/b6da9a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_UNIFY (SAMN14430801) [100%] 4 of 4 ✔
[81/633761] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_UNFILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:CELLBENDER_REMOVEBACKGROUND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:ADATA_BARCODES -
[3a/e9f636] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_FILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[a8/06eb8a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_RAW (SAMN14430801) [100%] 2 of 2 ✔
[e2/1b9140] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX (SAMN14430801) [100%] 2 of 2 ✔
[3c/f9170e] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_FILTER (SAMN14430801) [100%] 2 of 2 ✔
[00/31bc89] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_THRESHOLDED_SIZE (SAMN14430799) [100%] 1 of 1
[29/823820] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799) [100%] 1 of 1, failed: 1
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:DOUBLET_REMOVAL -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_DEDOUBLETED_SIZE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_FILTERED -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:COLLECT_SIZES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_MERGE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_UPSETGENES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCANPY_HVGS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCVITOOLS_SCVI -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_NEIGHBORS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_UMAP -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_LEIDEN -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_PAGA -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_RANKGENESGROUPS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_EXTEND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_TORDS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:MULTIQC -
-[nf-core/scdownstream] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799)'
Caused by:
Essential container in task exited
Command executed [/home/robert/repos/scdownstream/./workflows/../subworkflows/local/./../../modules/local/scanpy/scrublet/templates/scrublet.py]:
#!/usr/bin/env python3
import scanpy as sc
import platform
from threadpoolctl import threadpool_limits
threadpool_limits(int("6"))
sc.settings.n_jobs = int("6")
def format_yaml_like(data: dict, indent: int = 0) -> str:
"""Formats a dictionary to a YAML-like string.
Args:
data (dict): The dictionary to format.
indent (int): The current indentation level.
Returns:
str: A string formatted as YAML.
"""
yaml_str = ""
for key, value in data.items():
spaces = " " * indent
if isinstance(value, dict):
yaml_str += f"{spaces}{key}:\n{format_yaml_like(value, indent + 1)}"
else:
yaml_str += f"{spaces}{key}: {value}\n"
return yaml_str
adata = sc.read_h5ad("SAMN14430799_filtered.h5ad")
prefix = "SAMN14430799_scrublet"
use_gpu = "true" == "true"
if use_gpu:
import rapids_singlecell as rsc
import rmm
from rmm.allocators.cupy import rmm_cupy_allocator
import cupy as cp
rmm.reinitialize(
managed_memory=True,
pool_allocator=False,
)
cp.cuda.set_allocator(rmm_cupy_allocator)
rsc.get.anndata_to_GPU(adata)
rsc.pp.scrublet(adata, batch_key="batch")
rsc.get.anndata_to_CPU(adata)
else:
executor > awsbatch (18)
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_READH5 -
[b1/44af75] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READRDS (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_READCSV -
[12/b6da9a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:ADATA_UNIFY (SAMN14430801) [100%] 4 of 4 ✔
[81/633761] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_UNFILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:CELLBENDER_REMOVEBACKGROUND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:EMPTY_DROPLET_REMOVAL:ADATA_BARCODES -
[3a/e9f636] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_FILTERED_SIZE (SAMN14430801) [100%] 2 of 2 ✔
[a8/06eb8a] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_RAW (SAMN14430801) [100%] 2 of 2 ✔
[e2/1b9140] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX (SAMN14430801) [100%] 2 of 2 ✔
[3c/f9170e] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:SCANPY_FILTER (SAMN14430801) [100%] 2 of 2 ✔
[00/31bc89] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_THRESHOLDED_SIZE (SAMN14430799) [100%] 1 of 1
[29/823820] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799) [100%] 1 of 1, failed: 1
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:DOUBLET_REMOVAL -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:GET_DEDOUBLETED_SIZE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:QC_FILTERED -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:COLLECT_SIZES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_MERGE -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:ADATA_UPSETGENES -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCANPY_HVGS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:COMBINE:INTEGRATE:SCVITOOLS_SCVI -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_NEIGHBORS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_UMAP -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_LEIDEN -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_PAGA -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:CLUSTER:SCANPY_RANKGENESGROUPS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_EXTEND -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:FINALIZE:ADATA_TORDS -
[- ] process > NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:MULTIQC -
-[nf-core/scdownstream] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET (SAMN14430799)'
Caused by:
Essential container in task exited
Command executed [/home/robert/repos/scdownstream/./workflows/../subworkflows/local/./../../modules/local/scanpy/scrublet/templates/scrublet.py]:
#!/usr/bin/env python3
import scanpy as sc
import platform
from threadpoolctl import threadpool_limits
threadpool_limits(int("6"))
sc.settings.n_jobs = int("6")
def format_yaml_like(data: dict, indent: int = 0) -> str:
"""Formats a dictionary to a YAML-like string.
Args:
data (dict): The dictionary to format.
indent (int): The current indentation level.
Returns:
str: A string formatted as YAML.
"""
yaml_str = ""
for key, value in data.items():
spaces = " " * indent
if isinstance(value, dict):
yaml_str += f"{spaces}{key}:\n{format_yaml_like(value, indent + 1)}"
else:
yaml_str += f"{spaces}{key}: {value}\n"
return yaml_str
adata = sc.read_h5ad("SAMN14430799_filtered.h5ad")
prefix = "SAMN14430799_scrublet"
use_gpu = "true" == "true"
if use_gpu:
import rapids_singlecell as rsc
import rmm
from rmm.allocators.cupy import rmm_cupy_allocator
import cupy as cp
rmm.reinitialize(
managed_memory=True,
pool_allocator=False,
)
cp.cuda.set_allocator(rmm_cupy_allocator)
rsc.get.anndata_to_GPU(adata)
rsc.pp.scrublet(adata, batch_key="batch")
rsc.get.anndata_to_CPU(adata)
else:
sc.pp.scrublet(adata, batch_key="batch")
df = adata.obs[["predicted_doublet"]]
df.columns = ["SAMN14430799_scrublet"]
df.to_pickle("SAMN14430799_scrublet.pkl")
adata = adata[~adata.obs["predicted_doublet"]].copy()
adata.write_h5ad(f"{prefix}.h5ad")
# Versions
versions = {
"NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:DOUBLET_DETECTION:SCANPY_SCRUBLET": {
"python": platform.python_version(),
"scanpy": sc.__version__
}
}
with open("versions.yml", "w") as f:
f.write(format_yaml_like(versions))
Command exit status:
1
Command output:
(empty)
Command error:
Traceback (most recent call last):
File "/opt/conda/bin/ipython", line 6, in <module>
from IPython import start_ipython
File "/opt/conda/lib/python3.11/site-packages/IPython/__init__.py", line 55, in <module>
from .terminal.embed import embed
File "/opt/conda/lib/python3.11/site-packages/IPython/terminal/embed.py", line 15, in <module>
from IPython.core.interactiveshell import DummyMod, InteractiveShell
File "/opt/conda/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 110, in <module>
from IPython.core.history import HistoryManager
File "/opt/conda/lib/python3.11/site-packages/IPython/core/history.py", line 10, in <module>
import sqlite3
File "/opt/conda/lib/python3.11/sqlite3/__init__.py", line 57, in <module>
from sqlite3.dbapi2 import *
File "/opt/conda/lib/python3.11/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ImportError: /opt/conda/lib/python3.11/lib-dynload/_sqlite3.cpython-311-x86_64-linux-gnu.so: undefined symbol: sqlite3_deserialize
Work dir:
s3://ktmp/nextflow-work/scrnaseq_work/29/82382049c57c69b77be4ff4f368fd6
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
-- Check '.nextflow.log' file for details
Description of the bug
Hello,
This project a very exciting development for reproducible and standardized single cell analysis. However, I'm having some difficulty getting GPU acceleration to work for some of tasks.
When running the pipeline with the GPU profile on AWS batch, all of the scanpy GPU processes (SCANPY_SCRUBLET, SCANPY_HVGS, SCANPY_HARMONY, SCANPY_LEIDEN, SCANPY_NEIGHBORS) fail with the error: ImportError: /opt/conda/lib/python3.11/lib-dynload/_sqlite3.cpython-311-x86_64-linux-gnu.so: undefined symbol: sqlite3_deserialize
This looks like a problem with the docker container missing software, rather than the executor, but I don't have on-prem resources to test running locally (and if it were the container, then presumably others would have the same issue).
Other GPU tasks complete without a problem, after I modified my nextflow.config file to use a dedicated GPU compute envirnment (see attached).
Deleting the 'process_GPU' label from each of these processes allows the pipeline to run to completion (though without GPU acceleration for these processes).
Command used and terminal output
Command (run from parent directory of local clone): nextflow run scdownstream/ -profile docker,gpu -config /tmp/nextflow.config --input scdownstream/assets/samplesheet.csv --outdir --outdir {private s3 bucket}
Terminal Output:
Relevant files
nextflow_log_and_config.zip
System information