nf-core / scrnaseq

A single-cell RNAseq pipeline for 10X genomics data
https://nf-co.re/scrnaseq
MIT License
209 stars 166 forks source link

STAR always uses V2 chemistry #60

Closed jeremyadamsfisher closed 2 years ago

jeremyadamsfisher commented 3 years ago

Check Documentation

I have checked the following places for your error:

Description of the bug

STARsolo uses 10X-V2 chemistry, regardless of what is specified.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Run nextflow run nf-core/scrnaseq -r 1.1.0 -params-file nf-params.json

nf-params.json

{
    "chemistry": "V3",
    "input": "./data/*_{1,2}.fastq.gz",
    "fasta": "./data/genome.fa",
    "gtf": "./data/genes.gtf",
    "aligner": "star",
}

Where: genome.fa is from http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M27/GRCm39.genome.fa.gz; genes.gtf is from http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M27/gencode.vM27.annotation.gtf.gz; and the fastq files are from https://www.ncbi.nlm.nih.gov/sra/?term=SRR14597268

[13/99028a] process > get_software_versions     [100%] 1 of 1 ✔
[a5/5fcfb4] process > unzip_10x_barcodes (V3)   [100%] 1 of 1 ✔
[-        ] process > extract_transcriptome     -
[-        ] process > build_salmon_index        -
[8e/00790d] process > makeSTARindex (genome.fa) [100%] 1 of 1 ✔
[-        ] process > build_kallisto_index      -[-        ] process > build_gene_map            -
[-        ] process > build_txp2gene            -
[-        ] process > alevin                    -[-        ] process > alevin_qc                 -
[bd/676904] process > star (SRR14597268_1)      [100%] 2 of 2, failed: 2, retri..
[-        ] process > kallisto                  -[-        ] process > bustools_correct_sort     -
[-        ] process > bustools_count            -[-        ] process > bustools_inspect          -
[-        ] process > multiqc                   -
[ae/de0382] process > output_documentation      [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/scrnaseq] Pipeline completed with errors-
[8a/4103a7] NOTE: Process `star (SRR14597268_1)` terminated with an error exit status (104) -- Execution is retried (1)
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Error executing process > 'star (SRR14597268_1)'

Caused by:
  Process requirement exceed available memory -- req: 128 GB; avail: 124.4 GB

Command executed:

  STAR --genomeDir star \
        --sjdbGTFfile genes.gtf \
        --readFilesIn SRR14597268_2.fastq.gz SRR14597268_1.fastq.gz  \
        --runThreadN 10 \
        --twopassMode Basic \
        --outWigType bedGraph \
        --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 137338953472 \
        --readFilesCommand zcat \
        --runDirPerm All_RWX \
        --outFileNamePrefix SRR14597268_1  \
        --soloType Droplet \
        --soloCBwhitelist 10x_V3_barcode_whitelist

  samtools index SRR14597268_1Aligned.sortedByCoord.out.bam

Command exit status:
  -

Command output:
  Jun 29 21:52:14 ..... started STAR run
  Jun 29 21:52:15 ..... loading genome
  Jun 29 21:52:30 ..... processing annotations GTF
  Jun 29 21:52:39 ..... inserting junctions into the genome indices
  Jun 29 21:54:04 ..... started 1st pass mapping
  Jun 29 21:54:05 ..... finished 1st pass mapping
  Jun 29 21:54:05 ..... inserting junctions into the genome indices
  Jun 29 21:55:34 ..... started mapping

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.

  EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 28 not equal to expected 26
  Read ID=@SRR14597268.1 1 N 0   Sequence=CAGGCNAGTCCAACGCCCTTCTGCCTTT
  SOLUTION: make sure that the barcode read is the second in --readFilesIn and check that is has the correct formatting
            If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength

  Jun 29 21:55:35 ...... FATAL ERROR, exiting

Expected behaviour

According to the STAR readme,

The default barcode lengths (CB=16b, UMI=10b) work for 10X Chromium V2. For V3, specify:

--soloUMIlen 12

This option is not specified by the pipeline. The STAR script should differ by the chemistry, as per https://www.biostars.org/p/462568/

10x v1

Whitelist, 737K-april-2014_rc.txt CB length, 14 UMI start, 15 UMI length, 10 (courtesy ATpoint)

10X v2

Whitelist, 737K-august-2016.txt CB length, 16 UMI start, 17 UMI length, 10

10x v3

Whitelist, 3M-Feb_2018_V3.txt CB length, 16 UMI start, 17 UMI length, 12

Log files

nextflow.log

System

Nextflow Installation

Container engine

Additional context

Would be happy to write a PR

grst commented 2 years ago

This should be fixed in the latest dev version.

grst commented 2 years ago

This is actually NOT fixed in the latest dev version, as I get

EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 28 not equal to expected 26                                                                                                                                                            
Read ID=@A01174:218:HKWM7DSX2:4:1101:1036:1063 ;  Sequence=CTCATTACACGTACATGCGGGTTTGCCG                                                                                                                                                                                           
SOLUTION: check the formatting of input read files.                                                                                                                                                                                                                               
If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength                                                                                                                                                                
To avoid checking of barcode read length, specify --soloBarcodeReadLength 0                                                                                                                                                                                                       
Jun 14 13:07:28 ...... FATAL ERROR, exiting                                        

with a v3 library.

(v3 has 28nt barcode+umi, compared to 26 in v2)

apeltzer commented 2 years ago

We should get this fix into 2.0.0 too in my opinion :-(

grst commented 2 years ago

The default barcode lengths (CB=16b, UMI=10b) work for 10X Chromium V2. For V3, specify:

--soloUMIlen 12

(https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#running-starsolo-for-10x-chromium-scrna-seq-data)

I think this parameter is missing. Should be somehow generated by the java code in the lib folder.

apeltzer commented 2 years ago

Should be fixed in #113 now