sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
273 stars 67 forks source link

Barcodes present in BCstats.txt, but not in final result. #305

Closed diyang1354 closed 2 years ago

diyang1354 commented 2 years ago

Hi, When I use barcode whitelist+auto intersection to keep the wanted barcode, one barcode with over 20000 reads got filtered out someway. My library structure is 8bp barcode + 8bp UMI, and the missing barcode is ’CAGCGTTA‘. The reads containing this barcode do present in the fastq file: image Here is my barcode and yaml file: 2349-plate1.BCstats.txt 2349-plate1kept_barcodes.txt barcode_tang.txt 2349-plate1.yaml.txt I would be grateful if you can help me out.

Best regards

cziegenhain commented 2 years ago

Hi,

Sorry for the slow reply - I missed this post. Depending on the number of total barcodes, their read numbers and the amount of 'noisy' reads in the mix, the automatic cutoff could miss such a barcode.

I recommend that you set a manual threshold of minimum number of reads in a BC that you want to retain, together with the allow list but setting automatic detection off. So, for example all BCs part of the list with at least 10,000 reads:

barcodes:
  barcode_num: null
  barcode_file: Barcodes.txt
  barcode_sharing: null
  automatic: no
  BarcodeBinning: 1
  nReadsperCell: 10000
  demultiplex: no

Another side note: Depending on the sequencing quality in your reads, the quality cutoffs you have for BC and UMI filtering may be relatively strict and you could gain some retained reads by setting that more inclusive.

Best, Christoph