sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
268 stars 67 forks source link

Error in the counting stage #370

Open russellxie opened 10 months ago

russellxie commented 10 months ago

Hello,

I got the following error in the Counting stage.

[bam_sort_core] merging from 60 files and 20 in-memory blocks... Error in alldt[[i]][[2]] : subscript out of bounds

When I trace back this to the original code, it seems to be happening when it's trying to combine the barcode chunks (as shown below).

########################## assign reads to UB & GENE

for(i in unique(bccount$chunkID)){
     print( paste( "Working on barcode chunk", i, "out of",length(unique(bccount$chunkID)) ))
     print( paste( "Processing",length(bccount[chunkID==i]$XC), "barcodes in this chunk..." ))
     reads <- reads2genes_new(featfile = sortbamfile,
                              bccount  = bccount,
                              inex     = opt$counting_opts$introns,
                              chunk    = i,
                              cores    = opt$num_threads)

     tmp<-collectCounts(  reads =reads,
                          bccount=bccount[chunkID==i],
                          subsample.splits=subS[which(max(bccount[chunkID==i]$n) >= subS[,1]), , drop = FALSE],
                          mapList=mapList
                        )

     if(i==1){
       allC<-tmp
    }else{
       allC<-bindList(alldt=allC,newdt=tmp)
    }
}

I am using the known barcode list so a few barcodes got very few reads. Could that be the problem? image

russellxie commented 10 months ago

Also paste my yaml file here


sequence_files:
  file1:
    name: /gne/data/lab-shares/xie-lab/Sequencing_Data/2023/fastq/20230830_smartseq_test2/20230829_run/Undetermined_S0_R1_001.fastq.gz
    base_definition:
      - cDNA(23-75)
      - UMI(12-19)
    find_pattern: ATTGCGCAATG
  file2:
    name: /gne/data/lab-shares/xie-lab/Sequencing_Data/2023/fastq/20230830_smartseq_test2/20230829_run/Undetermined_S0_I1_001.fastq.gz
    base_definition:
      - BC(1-8)
  file3:
    name: /gne/data/lab-shares/xie-lab/Sequencing_Data/2023/fastq/20230830_smartseq_test2/20230829_run/Undetermined_S0_I2_001.fastq.gz
    base_definition:
      - BC(1-8)
reference:
  STAR_index: /gne/data/lab-shares/xie-lab/RefGenome/star_2.7.9/
  GTF_file: /gne/data/lab-shares/xie-lab/RefGenome/raw_files/hg38/Homo_sapiens.GRCh38.93.gtf
  additional_STAR_params: '--clip3pAdapterSeq CTGTCTCTTATACACATCT --clip3pAdapterMMp 0.1'

out_dir: /gne/data/lab-shares/xie-lab/Sequencing_Data/2023/mapping/20230830_smartseq3_run2/mapping_test
num_threads: 20
mem_limit: 50
filter_cutoffs:
  BC_filter:
    num_bases: 3
    phred: 20
  UMI_filter:
    num_bases: 2
    phred: 20
barcodes:
  barcode_num: ~
  barcode_file: /gne/data/lab-shares/xie-lab/Sequencing_Data/2023/mapping/20230830_smartseq3_run2/expected_barcodes.txt
  automatic: no
  BarcodeBinning: 1
  nReadsperCell: 100
  demultiplex: no
counting_opts:
  introns: yes
  downsampling: '10000'
  strand: 0
  Ham_Dist: 1
  write_ham: no
  velocyto: no
  primaryHit: yes
  twoPass: yes
make_stats: yes
which_Stage: Counting
samtools_exec: samtools
pigz_exec: pigz
STAR_exec: STAR
Rscript_exec: Rscript
read_layout: SE