sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
271 stars 67 forks source link

Problem with star index #136

Closed sbg-dalibor closed 5 years ago

sbg-dalibor commented 5 years ago

Hello,

i was using zUMIs with 10x data and got following error:

2019-08-20T15:38:09.108345849Z [W::bam_merge_core2] No @HD tag found. 2019-08-20T15:38:09.419192470Z 2019-08-20T15:38:09.419220585Z EXITING because of fatal PARAMETERS error: present --sjdbOverhang=90 is not equal to the value at the genome generation step =100 2019-08-20T15:38:09.419234299Z SOLUTION: 2019-08-20T15:38:09.419242566Z 2019-08-20T15:38:09.419250618Z Aug 20 15:38:09 ...... FATAL ERROR, exiting 2019-08-20T15:43:49.204564419Z Error in (function (cl, name, valueClass) : 2019-08-20T15:43:49.204607061Z assignment of an object of class "numeric" is not valid for @'Dim' in an object of class "dgTMatrix"; is(value, "integer") is not TRUE 2019-08-20T15:43:49.204619179Z Calls: convert2countM -> .makewide -> -> 2019-08-20T15:43:49.204630077Z Execution halted 2019-08-20T15:43:51.413449838Z Error in gzfile(file, "rb") : cannot open the connection 2019-08-20T15:43:51.413487050Z Calls: readRDS -> gzfile 2019-08-20T15:43:51.413497996Z In addition: Warning message: 2019-08-20T15:43:51.413507096Z In gzfile(file, "rb") : 2019-08-20T15:43:51.413516730Z cannot open compressed file '/sbgenomics/workspaces/3cf0c125-2ba6-4ac5-a348-be0ec4d5ae7f/tasks/99795684-f8c2-4e38-a078-cfd3a3ab3eae/zumis/zUMIs_output/expression/test_project.dgecounts.rds', probable reason 'No such file or directory' 2019-08-20T15:43:51.413527345Z Execution halted

I did generate my STAR index, using sjdbOverhang=100.Index was generated using STAR version 2.5.4b. What should i do here?

Here is info regarding files i was using:

https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_1k_v3

Here is my config yaml file content:

`###########################################

Welcome to zUMIs

below, please fill the mandatory inputs

We expect full paths for all files.

###########################################

define a project name that will be used to name output files

project: test_project

Sequencing File Inputs:

For each input file, make one list object & define path and barcode ranges

base definition vocabulary: BC(n) UMI(n) cDNA(n).

Barcode range definition needs to account for all ranges. You can give several comma-separated ranges for BC & UMI sequences, eg. BC(1-6,20-26)

you can specify between 1 and 4 input files

sequence_files: file1: name: /sbgenomics/workspaces/3cf0c125-2ba6-4ac5-a348-be0ec4d5ae7f/tasks/99795684-f8c2-4e38-a078-cfd3a3ab3eae/zumis/zummis_inputs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz base_definition:

reference genome setup

reference: STAR_index: /sbgenomics/workspaces/3cf0c125-2ba6-4ac5-a348-be0ec4d5ae7f/tasks/99795684-f8c2-4e38-a078-cfd3a3ab3eae/zumis/star_reference_dir #path to STAR genome index GTF_file: /sbgenomics/workspaces/3cf0c125-2ba6-4ac5-a348-be0ec4d5ae7f/tasks/99795684-f8c2-4e38-a078-cfd3a3ab3eae/zumis/zummis_inputs/Homo_sapiens.GRCh38.84.gtf #path to gene annotation file in GTF format additional_files: ##additional_files #Optional parameter. It is possible to give additional reference sequences here, eg ERCC.fa additional_STAR_params: #Optional parameter. you may add custom mapping parameters to STAR here

output directory

out_dir: /sbgenomics/workspaces/3cf0c125-2ba6-4ac5-a348-be0ec4d5ae7f/tasks/99795684-f8c2-4e38-a078-cfd3a3ab3eae/zumis #specify the full path to the output directory

###########################################

below, you may optionally change default parameters

###########################################

number of processors to use

num_threads: 6 mem_limit: null #Memory limit in Gigabytes, null meaning unlimited RAM usage.

barcode & UMI filtering options

number of bases under the base quality cutoff that should be filtered out.

Phred score base-cutoff for quality control.

filter_cutoffs: BC_filter: num_bases: 1 phred: 20 UMI_filter: num_bases: 1 phred: 20

Options for Barcode handling

You can give either number of top barcodes to use or give an annotation of cell barcodes.

If you leave both barcode_num and barcode_file empty, zUMIs will perform automatic cell barcode selection for you!

barcodes: barcode_num: null barcode_file: null automatic: yes #Give yes/no to this option. If the cell barcodes should be detected automatically. If the barcode file is given in combination with automatic barcode detection, the list of given barcodes will be used as whitelist. BarcodeBinning: 1 #Hamming distance binning of close cell barcode sequences. nReadsperCell: 100 #Keep only the cell barcodes with atleast n number of reads. demultiplex: no #produce per-cell demultiplexed bam files.

Options related to counting of reads towards expression profiles

counting_opts: introns: yes #can be set to no for exon-only counting. downsampling: 0 #Number of reads to downsample to. This value can be a fixed number of reads (e.g. 10000) or a desired range (e.g. 10000-20000) Barcodes with less than will not be reported. 0 means adaptive downsampling. Default: 0. strand: 0 #Is the library stranded? 0 = unstranded, 1 = positively stranded, 2 = negatively stranded Ham_Dist: 0 #Hamming distance collapsing of UMI sequences. write_ham: no #If hamming distance collapse of UMI sequences is performed, write out mapping tables & UB corrected bam files. velocyto: no #Would you like velocyto to do counting of intron-exon spanning reads primaryHit: yes #Do you want to count the primary Hits of multimapping reads towards gene expression levels? twoPass: yes #perform basic STAR twoPass mapping

produce stats files and plots?

make_stats: yes

Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.

which_Stage: Filtering

define dependencies program paths

samtools_exec: samtools #samtools executable Rscript_exec: Rscript #Rscript executable STAR_exec: /opt/STAR-2.5.4b/bin/Linux_x86_64/STAR #STAR executable pigz_exec: /opt/pigz-2.4/pigz #pigz executable

below, fqfilter will add a read_layout flag defining SE or PE

zUMIs_directory: /opt/zUMIs/ read_layout: SE`

Thanks in advance, Dalibor

cziegenhain commented 5 years ago

Hi Dalibor!

Thanks for giving zUMIs a go!

The issue arises because zUMIs expects to use a STAR index without a specific predefined splice-junction overhang value. In that way you can use the same index for whatever dataset and read-length you have!

Simply rerun the genome generation without the --sjdbOverhang attribute.

Best, Christoph

sbg-dalibor commented 5 years ago

It works now, thanks a lot!!!

Best, Dalibor

cziegenhain commented 5 years ago

Great to hear!