sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
268 stars 67 forks source link

Question:How zUMIs detect Barcodes and why the sum of second column in BCstat.txt is not equal to the sum of input reads number? #389

Open bioinfotec opened 5 months ago

bioinfotec commented 5 months ago

When I using awk to add up the second column in .BCstats.txt, I found the result is 8347237. However, the number of reads in the Read1.fq is 11844959, which is not equal to the BC sum. So I want to know how zUMIs detect the barcode?By the way, my data is like smart-seq3 which have a pattern "ATTGCGCAATG“ in the start of read1.fq. This is a snippte my yaml file:

sequence_files:
  file1:
    name: /home/data/231110_1/20231110SCS-1_L2_1.fq.gz #path to first file
    base_definition:
      - BC(12-17,33-40,56-63)
      - UMI(64-69)
    find_pattern: ATTGCGCAATG
  file2:
    name: /home/data/231110_1/20231110SCS-1_L2_2.fq.gz  #path to second file
    base_definition:
      - cDNA(1-150)