sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
269 stars 67 forks source link

BUG: stuck at 'Descriptive statistics...' #307

Closed pchiang5 closed 2 years ago

pchiang5 commented 2 years ago

Describe the bug Hello,

Everything was fine until the last step of 'Descriptive statistics...'. The step remained at least overnight without any progress. Although I could get the counts (loom and rds), but the blockage made serial analyses of multiple yamls impossible. Thank you.

To Reproduce

project: T35
sequence_files:
  file1:
    name: /mnt/c/Users/pc/Downloads/R1_001.fastq.gz
    base_definition: cDNA(1-50)
  file2:
    name: /mnt/c/Users/pc/Downloads/R2_001.fastq.gz
    base_definition:
    - BC(2-5,50-53,93-96)
    - UMI(6-9,46-49,89-92)
reference:
  STAR_index: /mnt/c/Users/pc/Downloads/indexHM/
  GTF_file: /home/pc/GoogleDrive/databases/fasta_annot/GRChm38.99.chr.gtf
  additional_STAR_params: --outTmpDir /home/pc/temp
  additional_files: ~
out_dir: /mnt/c/Users/pc/Downloads/STAR_T35
num_threads: 20
mem_limit: 0
filter_cutoffs:
  BC_filter:
    num_bases: 12
    phred: 0
  UMI_filter:
    num_bases: 12
    phred: 0
barcodes:
  barcode_num: ~
  barcode_file: /mnt/c/Users/pc/Downloads/whitelist_T35.csv
  automatic: no
  BarcodeBinning: 1
  nReadsperCell: 0
counting_opts:
  introns: yes
  downsampling: '2500, 5000, 10000, 20000, 50000 '
  strand: 1
  Ham_Dist: 0
  velocyto: no
  primaryHit: yes
  twoPass: no
make_stats: yes
which_Stage: Filtering
Rscript_exec: Rscript
STAR_exec: STAR
pigz_exec: pigz
samtools_exec: samtools

Screenshots

Descriptive statistics...

[1] "I am loading useful packages for plotting..." [1] "2022-03-17 23:23:19 CST" [1] "1e+09 Reads per chunk"

Desktop (please complete the following information): windows 11 WSL2

Additional context Add any other context about the problem here.

pchiang5 commented 2 years ago

It turned out fine if I modified the 1e+09 to 1e+03 below in the statsFUN.R to avoid summarizing bad BCs.

  if(sum(bccount$n)>1e+09){ #for huge datasets, don't summarise "bad BCs"
    bccount <- bccount[XC %in% bc$XC]
  }