nf-core / scrnaseq

A single-cell RNAseq pipeline for 10X genomics data
https://nf-co.re/scrnaseq
MIT License
211 stars 169 forks source link

cellranger count => Your reference does not contain the expected files #345

Open nick-youngblut opened 3 months ago

nick-youngblut commented 3 months ago

Description of the bug

  nextflow run main.nf \
   -ansi-log false \
   -profile docker,gcp \
   -work-dir gs://path/to/bucket/scrnaseq/work \
   --input samples.csv \
   --protocol 10XV3 \
   --aligner cellranger \
   --fasta gs://path/to/bucket/cellranger/refdata-gex-GRCh38-2024-A/fasta/genome.fa \
   --gtf gs://path/to/bucket/cellranger/refdata-gex-GRCh38-2024-A/genes/genes.gtf \
   --outdir gs://path/to/bucket/scrnaseq/SspArc0144_LL_BAT_ENPP1_H362A 

Note that I'm using the standard GRCh38-2024-A 10X Genomics reference that has worked many times before when running the scrnaseq pipeline.

The gcp profile added to the nextflow.config:

    gcp {
        process.executor       = "google-batch"
        process.errorStrategy  = "retry"
        process.maxRetries     = 3
        params.max_cpus        = 16
        params.max_memory      = "128.GB"
        params.max_time        = "96.h"
        google.project         = "my_gcp_project"
        google.location        = "us-west1"
        fusion.enabled         = true
        wave.enabled           = true
        process.scratch        = false
    }

Cellranger Count job

The .command.out:

Martian Runtime - v4.0.12
Serving UI at http://fc840b44f790:40909?auth=XTavSLnSF_0F2x4rhyCrWKC2Dga2ZbHj6QpFZ73OHgE

Running preflight checks (please wait)...
Checking sample info...
Checking FASTQ folder...
Checking reference...
Checking reference_path (/fusion/gs/arc-genomics-nextflow/scrnaseq/work/8a/304d5f879a5cdf3910639a6c0f7d44/cellranger_reference) on fc840b44f790...

[error] Your reference does not contain the expected files, or they are not readable. Please check your reference folder on fc840b44f790.

2024-07-11 15:22:33 Shutting down.
Saving pipestance info to "20240607_10X_3HT_Murine_BAT_SC1/20240607_10X_3HT_Murine_BAT_SC1.mri.tgz"
For assistance, upload this file to 10x Genomics by running:

cellranger upload <your_email> "20240607_10X_3HT_Murine_BAT_SC1/20240607_10X_3HT_Murine_BAT_SC1.mri.tgz"

The .command.err:

Traceback (most recent call last):
  File "/fusion/gs/arc-genomics-nextflow/scrnaseq/work/fb/86b5f819356e025fa94ba3cc60cf47/.command.sh", line 57, in <module>
    run(
  File "/opt/conda/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['cellranger', 'count', '--id', '20240607_10X_3HT_Murine_BAT_SC1', '--fastqs', 'fastq_all', '--transcriptome', 'cellranger_reference', '--localcores', '12', '--localmem', '72', '--chemistry', 'SC3Pv3', '--create-bam', 'true']' returned non-zero exit status 1.

The cellranger_reference file just contains /fusion/gs/arc-genomics-nextflow/scrnaseq/work/8a/304d5f879a5cdf3910639a6c0f7d44/cellranger_reference

Command used and terminal output

See above

Relevant files

No response

System information

grst commented 2 months ago

Have you tried to specify --cellranger_index gs://path/to/bucket/cellranger/refdata-gex-GRCh38-2024-A/ instead of --fasta and --gtf? That way the index wouldn't be recalculated.

That said, it should totally work to specify the fasta and gtf file from the reference folder.

nick-youngblut commented 2 months ago

That said, it should totally work to specify the fasta and gtf file from the reference folder.

Does it work on the cloud provider that you use for testing (I'm guessing AWS)?