sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
268 stars 67 forks source link

Unable to recognize a combination of multiple barcode sequences #360

Closed shangyf-stu closed 1 year ago

shangyf-stu commented 1 year ago

Hello, I am processing data for library construction using BD Rhapsody and inDrop. Both of them contain combined barcodes. Barcode not found in *. BCstats.txt file image The barcodes in the file are null, but there are counts. I don't know what caused this error. Additionally, in the data of InDrop, there may be 1-2 mismatches in the W1 sequence (GAGTGATTGCTTTGTGACGCCTT) in the fastq file. How to set it in zUMIs?

To Reproduce Yaml file: project: dataset467 sequence_files: file1: name: /dataset467/fastq1/dataset467_1.fastq.gz base_definition:

Screenshots Error message: Error in uik(bccount$cellindex, bccount$cs/1000) : Method is not applicable for such a small vector. Please give at least a 5 numbers vector Calls: cellBC -> .cellBarcode_unknown -> .FindBCcut -> uik Execution halted Error in fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, "kept_barcodes_binned.txt")) : File '/dataset467/zUMIs/zUMIs_output/dataset467kept_barcodes_binned.txt' does not exist or is non-readable. getwd()=='/dataset467/zUMIs' Execution halted Loading required package: yaml Loading required package: Matrix Error in gzfile(file, "rb") : cannot open the connection Calls: rds_to_loom -> readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/dataset467/zUMIs/zUMIs_output/expression/dataset467.dgecounts.rds', probable reason 'No such file or directory' Execution halted Error in data.table::fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, : File '/dataset467/zUMIs/zUMIs_output/dataset467kept_barcodes.txt' does not exist or is non-readable. getwd()=='/dataset467/zUMIs' Execution halted

I hope you can provide me with some help. Thank you! Best wishes, Shang

shangyf-stu commented 1 year ago

The yaml file above is for inDrop data, and the yaml file below is for BD Rhapsody: project: dataset466 sequence_files: file1: name: /dataset466/filter_fastp/dataset466_clean_1.fastq.gz base_definition:

cziegenhain commented 1 year ago

Hi,

To your questions:

  1. zUMIs does not support allowing for mismatches in the frameshift correction pattern
  2. You do not receive useful output because your YAML files are incorrect. You must specify the barcode base ranges with BC but you have written barcode or BD. https://github.com/sdparekh/zUMIs/wiki/Protocol-specific-setup

Please check carefully all documentation before opening issues. Thank you.

shangyf-stu commented 1 year ago

Thank you for your answer. All samples have been successfully run. I'm sorry for the mistake caused by my carelessness!