vib-singlecell-nf / vsn-pipelines

A repository of pipelines for single-cell data in Nextflow DSL2
GNU General Public License v3.0
75 stars 31 forks source link

[BUG] loom with non-standard CellID and Gene attributes [SCENIC] #279

Open cflerin opened 3 years ago

cflerin commented 3 years ago

Describe the bug With the SCENIC workflow and a loom input with non-standard cell and gene attribute names (CellID/Gene) the workflow fails to complete.

To Reproduce Steps to reproduce the behavior:

  1. Use a loom with the following column and row attributes (as an example):
    
    In [3]: lf.ca.keys()
    Out[3]: ['CellID_renamed', 'nGene', 'nUMI']

In [4]: lf.ra.keys() Out[4]: ['Gene_renamed']


1. Configure with these options:

nextflow pull vib-singlecell-nf/vsn-pipelines -r v0.23.0 nextflow config vib-singlecell-nf/vsn-pipelines -profile scenic,test__scenic,singularity > test_scenic.config

The cell and gene attributes are set in the SCENIC config section:

cell_id_attribute = 'CellID_renamed' gene_attribute = 'Gene_renamed'


2. Run using this entry point:

nextflow -C test_scenic.config run vib-singlecell-nf/vsn-pipelines -entry scenic -r v0.23.0


3. See error:
<!-- Please paste the full error output in this code block (if applicable, otherwise delete this block): -->

N E X T F L O W ~ version 20.04.1 Launching vib-singlecell-nf/vsn-pipelines [cheesy_mcnulty] - revision: 0a585c246f [v0.23.0] WARN: It appears you have never run this project before -- Option -resume is ignored WARN: DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE executor > local (5) [27/feab5a] process > scenic:SCENIC:ARBORETO_WITH_MULTIPROCESSING (1) [100%] 1 of 1 ✔ executor > local (5) [27/feab5a] process > scenic:SCENIC:ARBORETO_WITH_MULTIPROCESSING (1) [100%] 1 of 1 ✔ executor > local (5) [27/feab5a] process > scenic:SCENIC:ARBORETO_WITH_MULTIPROCESSING (1) [100%] 1 of 1 ✔ [2d/a61d5c] process > scenic:SCENIC:ADD_PEARSON_CORRELATION (1) [100%] 1 of 1 ✔ [1c/7488bd] process > scenic:SCENIC:CISTARGETMOTIF (1) [100%] 1 of 1 ✔ [b3/7d308b] process > scenic:SCENIC:AUCELLMOTIF (1) [100%] 1 of 1 ✔ [0f/08d00c] process > scenic:SCENIC:VISUALIZE (1) [100%] 1 of 1, failed: 1 ✘ [- ] process > scenic:SCENIC:PUBLISH_LOOM - [- ] process > scenic:PUBLISH_SCENIC:COMPRESS_HDF5 - [- ] process > scenic:PUBLISH_SCENIC:SC__PUBLISH - [- ] process > scenic:PUBLISH_SCENIC:SC__PUBLISH_PROXY - WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info. Error executing process > 'scenic:SCENIC:VISUALIZE (1)'

Caused by: Process scenic:SCENIC:VISUALIZE (1) terminated with an error exit status (1)

Command executed:

/user/leuven/325/vsc32528/.nextflow/assets/vib-singlecell-nf/vsn-pipelines/src/scenic/bin/add_visualization.py --loom_input scenic_CI__auc_mtf.loom --loom_output scenic_visualize.loom --num_workers 4

Command exit status: 1

Command output: (empty)

Command error: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/loompy/attribute_manager.py", line 115, in getattr vals = self.dict["storage"][name] KeyError: 'CellID'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/user/leuven/325/vsc32528/.nextflow/assets/vib-singlecell-nf/vsn-pipelines/src/scenic/bin/add_visualization.py", line 86, in visualize_AUCell(args) File "/user/leuven/325/vsc32528/.nextflow/assets/vib-singlecell-nf/vsn-pipelines/src/scenic/bin/add_visualization.py", line 53, in visualize_AUCell auc_mtx = pd.DataFrame(lf.ca.RegulonsAUC, index=lf.ca.CellID) File "/usr/local/lib/python3.7/site-packages/loompy/attribute_manager.py", line 123, in getattr raise AttributeError(f"'{type(self)}' object has no attribute '{name}'") AttributeError: '<class 'loompy.attribute_manager.AttributeManager'>' object has no attribute 'CellID'

Work dir: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/testruns/scenic-nf_testing/cellid_attr/work/0f/08d00c9f4029a57c19516d122be1a6

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out



**Expected behavior**
Pipeline should be able to run with arbitrary cell/gene attribute labels.

**Screenshots**
NA

**Please complete the following information:**
- OS: CentOS Linux release 7.8.2003 (Core)
- Nextflow Version: 20.04.1
- vsn-pipelines Version: v0.23.0

**Additional context**
This particular error is caused by:
https://github.com/vib-singlecell-nf/vsn-pipelines/blob/58137baa31e580e82e5e2add59e2b536d9754bd0/src/scenic/bin/add_visualization.py#L53

But there are also a few other places where the cell and gene attributes are hard coded that will also cause problems:
- https://github.com/vib-singlecell-nf/vsn-pipelines/blob/58137baa31e580e82e5e2add59e2b536d9754bd0/src/scenic/bin/aggregate_multi_runs_regulons.py#L36
- https://github.com/vib-singlecell-nf/vsn-pipelines/blob/58137baa31e580e82e5e2add59e2b536d9754bd0/src/scenic/bin/export_to_loom.py#L128

---
Also important to note: this is related to aertslab/pySCENIC/issues/235 , and this issue caused a failure in the AUCell step when using pySCENIC 0.10.4. After fixing this bug in pySCENIC, and using the pySCENIC dev version here (`container = 'aertslab/pyscenic:dev'`) we get the above problem.
GreyRockIQ commented 2 years ago

Hello @cflerin I have across same error at scenic:SCENIC:VISUALIZE (1). Is there a solution to proceed from this step on the pipeline? or the output of the previous steps can be used for further analysis in r or python? Thanks GreyRock