nf-core / scdownstream

A single cell transcriptomics pipeline for QC, integration and making the data presentable
https://nf-co.re/scdownstream
MIT License
19 stars 6 forks source link

Minimal sample sheet with both filtered and unfiltered produced by scrnaseq triggers celda decontx runtime error #64

Open smoe opened 1 month ago

smoe commented 1 month ago

Description of the bug

Hello, a previous run only providing the unfiltered data worked fine. I have now added the filtered data from a rerun of scrnaseq just to increase my options. That addition of filtered data triggered the error below for decontx for all the samples. Removing the pointer to filtered data will have it working again.

Command used and terminal output

:CELDA_DECONTX (T0); status: COMPLETED; exit: 1; error: -; workDir: /..../nextflow_scdownstream/work/5c/5d5e958a472843c607852979ad9211 started: 172112383
2785; exited: 2024-07-16T09:58:09Z; ]
Juli-16 11:58:12.796 [TaskFinalizer-4] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX (T0); work-dir=/....../nextflow_scdownstream/work/5c/5d5e958a472
843c607852979ad9211
  error [nextflow.exception.ProcessFailedException]: Process `NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX (T0)` terminated with an error exit status (1)
Juli-16 11:58:12.867 [TaskFinalizer-4] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX (T0)'

Caused by:
  Process `NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX (T0)` terminated with an error exit status (1)

Command executed [....nextflow/assets/nf-core/scdownstream/./workflows/../subworkflows/local/./../../modules/local/celda/decontx/templates/decontx.py]:

  #!/usr/bin/env python3

  import anndata as ad
  import anndata2ri
  import rpy2
  import rpy2.robjects as ro
  import platform
  import os
  celda = ro.packages.importr('celda')

  def format_yaml_like(data: dict, indent: int = 0) -> str:
      """Formats a dictionary to a YAML-like string.

      Args:
          data (dict): The dictionary to format.
          indent (int): The current indentation level.

      Returns:
          str: A string formatted as YAML.
      """
      yaml_str = ""
      for key, value in data.items():
          spaces = "  " * indent
          if isinstance(value, dict):
              yaml_str += f"{spaces}{key}:\n{format_yaml_like(value, indent + 1)}"
          else:
              yaml_str += f"{spaces}{key}: {value}\n"
      return yaml_str

  adata = ad.read_h5ad("T0_filtered_unified.h5ad")
  sce = anndata2ri.py2rpy(adata)

  kwargs = {}

  if len(adata.obs['batch'].unique()) > 1:
      kwargs['batch'] = adata.obs['batch'].tolist()

  raw_path = "T0_unfiltered_unified.h5ad"
  if os.path.exists(raw_path):
      raw = ad.read_h5ad(raw_path)
      if "counts" not in raw.layers:
          raw.layers["counts"] = raw.X.copy() 
      kwargs["background"] = anndata2ri.py2rpy(raw)

  corrected = celda.decontX(sce, **kwargs)
  counts = celda.decontXcounts(corrected)

  adata.layers['ambient'] = anndata2ri.rpy2py(counts).T
  adata.write_h5ad("T0_decontx.h5ad")

  # Versions

  versions = {
      "NFCORE_SCDOWNSTREAM:SCDOWNSTREAM:PREPROCESS:AMBIENT_RNA_REMOVAL:CELDA_DECONTX": {
          "python": platform.python_version(),
          "anndata": ad.__version__,
          "anndata2ri": anndata2ri.__version__,
          "rpy2": rpy2.__version__,
          "celda": celda.__version__,
      }
  }

  with open("versions.yml", "w") as f:
      f.write(format_yaml_like(versions))

Command exit status:
  1

Command output:
  (empty)

Command error:
  R[write to console]: Tue Jul 16 11:58:03 2024 ..  3340  cells in the background matrix were removed as they were found in  the filtered matrix.

  R[write to console]: --------------------------------------------------

  R[write to console]: Starting DecontX

  R[write to console]: --------------------------------------------------

  R[write to console]: Tue Jul 16 11:58:03 2024 .. Analyzing all cells

  R[write to console]: Tue Jul 16 11:58:03 2024 .... Generating UMAP and estimating cell types

  R[write to console]: Error in .local(x, ...) : size factors should be positive

  Traceback (most recent call last):
    File ".command.sh", line 45, in <module>  
      corrected = celda.decontX(sce, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.12/site-packages/rpy2/robjects/functions.py", line 208, in __call__
      return (super(SignatureTranslatedFunction, self)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.12/site-packages/rpy2/robjects/functions.py", line 131, in __call__
      res = super(Function, self).__call__(*new_args, **new_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.12/site-packages/rpy2/rinterface_lib/conversion.py", line 45, in _
      cdata = function(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.12/site-packages/rpy2/rinterface.py", line 817, in __call__
      raise embedded.RRuntimeError(_rinterface._geterrmessage())
  rpy2.rinterface_lib.embedded.RRuntimeError: Error in .local(x, ...) : size factors should be positive

Work dir:
  .... nextflow_scdownstream/work/5c/5d5e958a472843c607852979ad9211

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Relevant files

No response

System information

  N E X T F L O W
  version 24.04.3 build 5916
  created 09-07-2024 19:35 UTC (21:35 MESZ)
  cite doi:10.1038/nbt.3820
  http://nextflow.io

slurm HPC Singularity CentOS Linux -r dev

nictru commented 1 month ago

Could you investigate the differences between the filtered file that comes from scrnaseq and the one that is created when only providing the unfiltered file to scdownstream? Especially concerning empty barcodes/genes