sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
273 stars 67 forks source link

Error in uik() - Method is not applicable for such a small vector - inDrops Protocol #220

Closed saisomesh2594 closed 3 years ago

saisomesh2594 commented 3 years ago

Hi Christoph,

I am attempting to run zUMIs v2.9 on a inDrops Protocol sample on a CentOS Linux OS - release 7.6 with the following YAML configuration file -

project: inDropszumi

sequence_files:
  file1:
    name: /path/to/inDrops_S2_R1_001.fastq.gz
    base_definition:
      - cDNA(1-60)

  file2:
    name: /path/to/inDrops_S2_R2_001.fastq.gz
    base_definition:
      - BC(1-8,31-38)
      - UMI(39-44)
    correct_frameshift: GAGTGATTGCTTGTGACGCCTT

reference:
  STAR_index: /path/to/refdata-hg19/
  GTF_file: /path/to/gencode.v27.annotation.gtf
  exon_extension: no
  extension_length: 0
  scaffold_length_min: 0

out_dir: /path/to/inDropszumi

num_threads: 6
mem_limit: 16

filter_cutoffs:
  BC_filter:
    num_bases: 1
    phred: 20
  UMI_filter:
    num_bases: 1
    phred: 20

barcodes:
  barcode_num: null
  barcode_file: null
  barcode_sharing: null
  automatic: yes
  BarcodeBinning: 0
  nReadsperCell: 100
  demultiplex: no

#Options related to counting of reads towards expression profiles
counting_opts:
  introns: yes
  intronProb: no
  downsampling: 0
  strand: 0
  Ham_Dist: 0
  velocyto: no
  primaryHit: yes
  multi_overlap: no
  twoPass: yes

#produce stats files and plots?
make_stats: yes

#Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.
which_Stage: Filtering

#define dependencies program paths
samtools_exec: samtools
Rscript_exec: Rscript #Rscript executable
STAR_exec: STAR
pigz_exec: pigz
zUMIs_directory: /path/to/zUMIs/

#below, fqfilter will add a read_layout flag defining SE or PE

The job output file looks like follows -

------------- 

 Good news! A newer version of zUMIs is available at https://github.com/sdparekh/zUMIs 

-------------
Using miniconda environment for zUMIs!
 note: internal executables will be used instead of those specified in the YAML file!

 You provided these parameters:
 YAML file:                 adenoma-080.yaml
 zUMIs directory:       /path/to/zUMIs/
 STAR executable        STAR
 samtools executable    samtools
 pigz executable        pigz
 Rscript executable     Rscript
 RAM limit:   16
 zUMIs version 2.9.4c 

Thu 15 Oct 14:59:30 CEST 2020
WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a.
Filtering...
Thu 15 Oct 15:02:39 CEST 2020
Mapping...
[1] "2020-10-15 15:02:40 CEST"
Oct 15 15:02:42 ..... started STAR run
Oct 15 15:02:42 ..... loading genome
Oct 15 15:03:09 ..... processing annotations GTF
Oct 15 15:03:20 ..... inserting junctions into the genome indices
Oct 15 15:05:46 ..... started 1st pass mapping
Oct 15 15:05:47 ..... finished 1st pass mapping
Oct 15 15:05:47 ..... inserting junctions into the genome indices
Oct 15 15:07:01 ..... started mapping
Oct 15 15:07:02 ..... finished mapping
Oct 15 15:07:02 ..... finished successfully
Thu 15 Oct 15:07:02 CEST 2020
Counting...
[1] "2020-10-15 15:07:10 CEST"
Thu 15 Oct 15:07:10 CEST 2020
[1] "loomR found"
Thu 15 Oct 15:07:11 CEST 2020
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2020-10-15 15:07:12 CEST"
Thu 15 Oct 15:07:23 CEST 2020

and the job error file as follows -

Error in uik(bccount$cellindex, bccount$cs/1000) : 
  Method is not applicable for such a small vector. Please give at least a 5 numbers vector
Calls: cellBC -> .cellBarcode_unknown -> .FindBCcut -> uik
Execution halted
Error in fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, "kept_barcodes.txt")) : 
  File '/path/to/inDropszumi/zUMIs_output/inDropszumikept_barcodes.txt' does not exist or is non-readable. getwd()=='/path/to/inDropszumi'
Execution halted
Loading required package: yaml
Loading required package: Matrix
Error in gzfile(file, "rb") : cannot open the connection
Calls: rds_to_loom -> readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '/path/to/inDropszumi/zUMIs_output/expression/inDropszumi.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Error in data.table::fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project,  : 
  File '/path/to/inDropszumi/zUMIs_output/inDropszumikept_barcodes.txt' does not exist or is non-readable. getwd()=='/path/to/inDropszumi'
Execution halted

I am unable to understand the cause of this. Do you think I will have to set automatic barcode detection to no and provide a whitelist of barcodes? Or is it something else?

Could you help with this?

Do let me know if you need more information.

Thanks, Somesh

cziegenhain commented 3 years ago

Hi,

Sounds like an issue with too few BC observations. I can take a look at the BCstats.txt file from your run directory to see what is going on if you upload it.

saisomesh2594 commented 3 years ago

Hey,

Here you go! That would be of great help. Thanks

inDropszumi.BCstats.txt

Somesh

cziegenhain commented 3 years ago

So there are only two observed barcodes with read numbers above your cutoff of minimally 100 reads. The automatic BC detection needs at least 5 barcodes to work.

saisomesh2594 commented 3 years ago

Hi Christoph,

That seems weird. We had managed to run zUMI 2.0.6 on this set of files without this error.

One thing that we did observe is that in the inDropszumi.BCstats.txt file from the newer zUMI version (uploaded above), the first 8 nucleotides are all the same for every BC and its the first 8 nucleotides mentioned in the correct_frameshift. When comparing with the BCstats.txt file (obtained with zUMI 2.0.6), the barcodes are all different, although it's the same fastq to begin with.

So, we think that the error comes either from the YAML file or the barcode detection.

What are your thoughts on this?

Thanks, Somesh

cziegenhain commented 3 years ago

Yes sounds to me like some barcode settings are wrong!

saisomesh2594 commented 3 years ago

Hey,

So, we were able to figure this out. The issue was with specifying the correct_frameshift parameter in the YAML configuration file. I, suppose, in the older version zUMI 2.0.6 this was absolutely required, but the current version is able to handle the barcodes automatically, even for inDrops v3 protocol.

Thanks for all your help again!

Somesh

cziegenhain commented 3 years ago

Great!

shangyf-stu commented 3 years ago

Hi, Sorry, I have met the same problem. Does it means when I use zUMIs-2.9.4 to deal with inDrop dataset, I should not set the parameter correct_frameshift? But the first part of barcode is variable, and when I don't set this parameter, I find 40,000 barcodes, it's crazy! Please give me some help, thanks a lot!! Shang