nf-core / scrnaseq

A single-cell RNAseq pipeline for 10X genomics data
https://nf-co.re/scrnaseq
MIT License
211 stars 169 forks source link

Crashes due to trying to unzip file that is not zipped #380

Open esrice opened 3 weeks ago

esrice commented 3 weeks ago

Description of the bug

I specified an un-gzipped whitelist file with the --barcode_whitelist parameter. In the STAR_ALIGN step, it tries to unzip this file, which causes gzip to crash, which causes the step to fail. This is the offending line of .command.sh:

--soloCBwhitelist <(gzip -cdf 3M-february-2018.txt)

I will try to fix and submit a PR in the next day or two.

Command used and terminal output

$ nextflow run nf-core/scrnaseq \ -profile singularity \ --input ../samples.csv \ --fasta: ../../ref/bGalGal1b_modified.fa \ --gtf: ../../ref/bGalGal1b_modified_filtered.gtf \ --protocol 10XV3 \ --aligner star \ --outdir out \ --barcode_whitelist /mnt/pixstor/data/esrbhb/3M-february-2018.txt \ --save_reference

ERROR ~ Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_ALIGN (D2)'

Caused by: Process NFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_ALIGN (D2) terminated with an error exit status (104)

Command executed:

STAR \ --genomeDir star \ --readFilesIn D2_S2_L001_R2_001.fastq.gz D2_S2_L001_R1_001.fastq.gz \ --runThreadN 16 \ --outFileNamePrefix D2. \ --soloCBwhitelist <(gzip -cdf 3M-february-2018.txt) \ --soloType CB_UMI_Simple \ --soloFeatures Gene \ --soloUMIlen 12 \ \ --sjdbGTFfile bGalGal1b_modified_genes.gtf \ --outSAMattrRGline ID:D2 'SM:D2' \ \ --readFilesCommand zcat --runDirPerm All_RWX --outWigType bedGraph --twopassMode Basic --outSAMtype BAM SortedByCoordinate \

if [ -f D2.Unmapped.out.mate1 ]; then mv D2.Unmapped.out.mate1 D2.unmapped_1.fastq gzip D2.unmapped_1.fastq fi if [ -f D2.Unmapped.out.mate2 ]; then mv D2.Unmapped.out.mate2 D2.unmapped_2.fastq gzip D2.unmapped_2.fastq fi

if [ -d D2.Solo.out ]; then

Backslashes still need to be escaped (https://github.com/nextflow-io/nextflow/issues/67)

  find D2.Solo.out \( -name "*.tsv" -o -name "*.mtx" \) -exec gzip {} \;

fi

cat <<-END_VERSIONS > versions.yml "NFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STARALIGN": star: $(STAR --version | sed -e "s/STAR//g") END_VERSIONS

Command exit status: 104

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred gzip: invalid magic

EXITING because of FATAL ERROR: CB whitelist file /dev/fd/63 is empty. SOLUTION: provide non-empty whitelist.

Oct 11 07:18:04 ...... FATAL ERROR, exiting

Work dir: /mnt/pixstor/warrenwc-lab/users/edward/nxf_work/1a/6d24d1b3b8d570f7e134a16d877d51

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

-- Check '.nextflow.log' file for details -[nf-core/scrnaseq] Pipeline completed with errors- WARN: Killing running tasks (1)

Relevant files

No response

System information

grst commented 5 days ago

Hi,

thanks for reporting. Have validated this by running gzip -cdf 3M-february-2018.txt on your file manually? Because the -f flag of gzip should already deal with non-compressed files.

esrice commented 5 days ago

Oh, weird. As you predicted, running that command manually works just fine. So I don't understand why the same command appears to fail inside the pipeline leaving it with an empty whitelist, or why my attempted fix (see PR) of only running gzip if the filename ends in ".gz" prevents this from happening. Do you have any ideas?

grst commented 5 days ago

Can you try running it inside the cellranger container? Maybe it has a different version of gzip...

esrice commented 5 days ago

Ah yup that's the problem:

$ gzip -cdf /mnt/pixstor/data/esrbhb/3M-february-2018.txt # this works
$ singularity exec -B /mnt https://depot.galaxyproject.org/singularity/star:2.7.10b--h9ee0642_0 gzip -cdf /mnt/pixstor/data/esrbhb/3M-february-2018.txt
gzip: invalid magic

My system gzip is v1.9 but the container gzip is BusyBox v1.32.1.

grst commented 5 days ago

ok, then your PR should fix this. Many thanks for checking!