sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
275 stars 68 forks source link

sjdbOverhang Error when trying to run zUMIs #341

Closed stkgo closed 1 year ago

stkgo commented 1 year ago

Describe the bug When trying to run zUMIs I receive the following error: EXITING because of fatal PARAMETERS error: present --sjdbOverhang=15 is not equal to the value at the genome generation step =100

To Reproduce YAML:

project: zUMIs-test

sequence_files:
  file1:
    name: /home/ubuntu/zumis_test/sequence.fastq.gz
    base_definition:
      - cDNA(1-150)

reference:
  STAR_index: /home/ubuntu/reference/hg38/star_index_2.7.3a 
  GTF_file: /home/ubuntu/reference/genes/gencode.v38.annotation.gtf
  #additional_files:
  additional_STAR_params: '--clip3pAdapterSeq CTGTCTCTTATACACATCT'

out_dir: /home/ubuntu/zumis_test/results/

num_threads: 6
mem_limit: null

filter_cutoffs:
  BC_filter:
    num_bases: 1
    phred: 20
  UMI_filter:
    num_bases: 1
    phred: 20

barcodes:
  barcode_num: null
  barcode_file: null
  automatic: yes
  BarcodeBinning: 1
  nReadsperCell: 100
  demultiplex: no

counting_opts:
  introns: yes
  downsampling: 0
  strand: 0
  Ham_Dist: 0
  write_ham: no
  velocyto: no
  primaryHit: yes
  twoPass: yes

make_stats: yes

which_Stage: Filtering

samtools_exec: samtools
Rscript_exec: Rscript
STAR_exec: /home/ubuntu/STAR
pigz_exec: pigz

Output:

~/software/zUMIs/zUMIs.sh -c -y my_config.yaml
Using miniconda environment for zUMIs!
 note: internal executables will be used instead of those specified in the YAML file!

 You provided these parameters:
 YAML file: my_config.yaml
 zUMIs directory:       /home/ubuntu/software/zUMIs
 STAR executable        STAR
 samtools executable        samtools
 pigz executable        pigz
 Rscript executable     Rscript
 RAM limit:   null
 zUMIs version 2.9.7c 

Tue Dec 20 21:48:11 UTC 2022
WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a.
Filtering...
Tue Dec 20 21:48:13 UTC 2022
Error in uik(bccount$cellindex, bccount$cs/1000) : 
  Method is not applicable for such a small vector. Please give at least a 5 numbers vector
Calls: cellBC -> .cellBarcode_unknown -> .FindBCcut -> uik
Execution halted
Mapping...
[1] "2022-12-20 21:48:14 UTC"
Dec 20 21:48:19 ..... started STAR run
Dec 20 21:48:19 ..... loading genome
[E::hts_open_format] Failed to open file NA
samtools view: failed to open "NA" for reading: No such file or directory
[E::hts_open_format] Failed to open file NA
samtools view: failed to open "NA" for reading: No such file or directory
Dec 20 21:48:19 ..... started STAR run
Dec 20 21:48:19 ..... loading genome

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=15 is not equal to the value at the genome generation step =100
SOLUTION: 

Dec 20 21:48:19 ...... FATAL ERROR, exiting
Dec 20 21:48:19 ..... started STAR run
Dec 20 21:48:19 ..... loading genome

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=15 is not equal to the value at the genome generation step =100
SOLUTION: 

Dec 20 21:48:19 ...... FATAL ERROR, exiting

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=15 is not equal to the value at the genome generation step =100
SOLUTION: 

Dec 20 21:48:19 ...... FATAL ERROR, exiting
[main_cat] ERROR: input is not BAM or CRAM
[main_cat] ERROR: input is not BAM or CRAM
Tue Dec 20 21:48:19 UTC 2022
Counting...
[1] "2022-12-20 21:48:25 UTC"
Error in fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, "kept_barcodes_binned.txt")) : 
  File '/home/ubuntu/zumis_test/results//zUMIs_output/zUMIs-testkept_barcodes_binned.txt' does not exist or is non-readable. getwd()=='/home/ubuntu/zumis_test/results'
Execution halted
Tue Dec 20 21:48:25 UTC 2022
Loading required package: yaml
Loading required package: Matrix
[1] "loomR found"
Error in gzfile(file, "rb") : cannot open the connection
Calls: rds_to_loom -> readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '/home/ubuntu/zumis_test/results//zUMIs_output/expression/zUMIs-test.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Tue Dec 20 21:48:26 UTC 2022
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2022-12-20 21:48:26 UTC"
Error in data.table::fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project,  : 
  File '/home/ubuntu/zumis_test/results//zUMIs_output/zUMIs-testkept_barcodes.txt' does not exist or is non-readable. getwd()=='/home/ubuntu/zumis_test/results'
Execution halted
Tue Dec 20 21:48:30 UTC 2022

Desktop (please complete the following information):

Additional context The warning says that STAR version 2.7.1a was used for indexing, but this is not the case. 2.7.3a was built from source and used. I suspect that STAR does not properly write the version to the file that zUMIs is checking.

cziegenhain commented 1 year ago

Hi there,

I don't think zUMIs will work in your use case? If you just have plain cDNA to map, you could directly apply STAR! zUMIs becomes useful if you are handling cell barcodes and optionally UMIs.

Best, Christoph

stkgo commented 1 year ago

Do you have any advice for determining based on a sample if there are barcodes or UMIs to map? I was under the impression this sample did have some, and when I add BC(1-8) to the base_definition I get the following under the Filtering step:

Wed Dec 21 15:47:32 UTC 2022
[1] "17 barcodes detected."
[1] "22001 reads were assigned to barcodes that do not correspond to intact cells."
[1] "Found 35 daughter barcodes that can be binned into 16 parent barcodes."
[1] "Binned barcodes correspond to 6949 reads."

However, I still have the same sjdbOverhang errors.

cziegenhain commented 1 year ago

Difficult to guess this, but you could try to look at FASTQC and check if something stands out there. For the sjdbOverhang, did you create the STAR index without GTF file as recommended:

https://github.com/sdparekh/zUMIs/wiki/Usage#preparing-star-index-for-mapping

stkgo commented 1 year ago

I re-indexed without the GTF and am now getting this error when mapping:

Filtering...
Wed Dec 21 17:23:27 UTC 2022
[1] "17 barcodes detected."
[1] "22001 reads were assigned to barcodes that do not correspond to intact cells."
[1] "Found 35 daughter barcodes that can be binned into 16 parent barcodes."
[1] "Binned barcodes correspond to 6949 reads."
Mapping...
[1] "2022-12-21 17:23:29 UTC"
Dec 21 17:23:34 ..... started STAR run
Dec 21 17:23:34 ..... loading genome
Dec 21 17:23:34 ..... started STAR run
Dec 21 17:23:34 ..... loading genome
[E::hts_open_format] Failed to open file NA
samtools view: failed to open "NA" for reading: No such file or directory
[E::hts_open_format] Failed to open file NA
samtools view: failed to open "NA" for reading: No such file or directory
Dec 21 17:23:34 ..... started STAR run
Dec 21 17:23:34 ..... loading genome
Dec 21 17:30:07 ..... processing annotations GTF
Dec 21 17:30:07 ..... processing annotations GTF
Dec 21 17:30:29 ..... inserting junctions into the genome indices
Dec 21 17:30:29 ..... inserting junctions into the genome indices
Dec 21 17:32:53 ..... started 1st pass mapping
Dec 21 17:32:53 ..... finished 1st pass mapping
Dec 21 17:32:53 ..... inserting junctions into the genome indices
Dec 21 17:32:54 ..... started 1st pass mapping

EXITING because of FATAL ERROR in reads input: short read sequence line: 0
Read Name=@SRR5659912.1.1
Read Sequence====
DEF_readNameLengthMax=50000
DEF_readSeqLengthMax=650

Dec 21 17:32:54 ...... FATAL ERROR, exiting
[E::hts_open_format] Failed to open file NA
samtools view: failed to open "NA" for reading: No such file or directory
[E::hts_open_format] Failed to open file NA
samtools view: failed to open "NA" for reading: No such file or directory
[E::hts_open_format] Failed to open file NA
samtools view: failed to open "NA" for reading: No such file or directory
Dec 21 17:33:56 ..... started mapping
Dec 21 17:33:56 ..... finished mapping
Dec 21 17:33:57 ..... finished successfully
[main_cat] ERROR: input is not BAM or CRAM
[main_cat] ERROR: input is not BAM or CRAM

It's not clear to me what file STAR failed to open.

cziegenhain commented 1 year ago

I'm pretty sure it doesn't agree to the SRA-style readIDs from the input bam file, give this a shot: https://github.com/sdparekh/zUMIs/wiki/Reprocessing-of-public-data