sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
275 stars 68 forks source link

ERROR! Fastq files are not in the same order. Make sure to provide reads in the same order #310

Closed parichitran closed 2 years ago

parichitran commented 2 years ago

Hi christoph I ran 10cells of my own data too.Its a single cell rna-seq ( cel-seq data with umi(8bpBC-5bpUMI)) and its an already demultiplexed one.So i used the following code as recommended for this kind of reads in zumi

Rscript misc/merge_demultiplexed_fastq.R --dir /media/ValidationofDedup/merge --pigz /media/software/zUMIs-main/zUMIs-env/bin/pigz --threads 10

And i generated all the 5 files Now in my yaml file if include the index read file I am getting the following error

"ERROR! Fastq files are not in the same order. Make sure to provide reads in the same order." I even tried to sort my merged reads with fastq_pair i got the same error

Then if i wont include the index file in my yaml.i can run zumi .whether my run is correct or not.can you help me. I am here attatching my log file and yaml file mydata.yaml.txt logofdata.txt

Then i compared the results with umitools too for making a confirmation my umi deduplication worked.But i couldnt conclude because of variation in gene percentage.you can check the details of the difference here in the link Details of comparison

cziegenhain commented 2 years ago

Hi,

Must be some corrupted file then, you can confirm manually for example that all files have the same number of rows and that the read IDs match..

For your last comment on any comparisons, I am not reviewing this because it's outside of the scope of what we support here. you can of course message here if there are any specific issues with zUMIs.

parichitran commented 2 years ago

Thanks for your timely reply details regarding mate pairs: cell1: Read1-14248000 Read2-14248000 both have same rows and id too() cell2:Read1-9683396 Read2-9683396 both have same rows and id too same for all 10 cell sample too I Even tried sorting with fastq_pair but no any use I feel file is not corrupted, becuase it ran fine with zUMI without that indexfastq , i can get all results like exoncount table too with data for all 10 cells My question is will it be fine if i run zUMI without that index fastq

Thanks in advance

cziegenhain commented 2 years ago

I'm confused by your description of what kind of data you have. If there is anyway a unique cell barcode within any of your read files, you would not need the merge_demultiplexed_fastq.R? You can simply concatenate your fastq files in that case.

parichitran commented 2 years ago

Description of my reads: Actually the people who published the data in a way that each fastq pair contain reads for one particular cell.They already demultiplexed the index reads from the sequencing data. For instance cell-1 fastq pair Read1:8bp cell specific barcode+5bp umi @SRR4246.1 H8HCPADXX:1:1101:10000:35164/1 AGGCGCCTAAGAANNNNNNNNNNNN [[8bp BC] [ 5bp umi]

Read 2:47 bp cDNA reads Each fastq pair is specific to one particular cell

Even my exonumi count data too had the same 8bp unique cell barcode as column-1(AGGCGCCT) name

cziegenhain commented 2 years ago

Alright in this case you can just concatenate the read files over all cells you want to process, you won't need to use the merging script.

Best, C

parichitran commented 2 years ago

Thanks a lot christoph

with warm regards Parichitran A

GGboy-Zzz commented 9 months ago

Sorry for disturbe again. Thank your for this issue, I get the same error like @parichitran , but if I dont have the index_fastq files, can I just only cat the read1 or read2 fastq files, and not add the index_fastq files in yaml files? @cziegenhain