YAML config for merged demultiplex files

roxyisat-rex commented 3 years ago

Hello
I am starting with paired end reads demultiplexed fastqs, after merging them, aside from the R1 and R2 fastq.gz (1 each), one of the other fastq files generated was the index.fastq.gz,( so 3 fastq.gz in total ). I understood that the index.fastq.gz is for generated barcode reads to be used in zUMIs. However, I wanted to ask, for the YAML config, in the number of input fastq files section, I should be putting down the 3, because it says to include index reads, is that correct? And BC, UMI and cDNA would be the same for the index.fastq and for the R1 and R2? I am asking because the annotated preset YMAL didn't have the index in there, so I wanted to make sure. Sorry if this is a bit naive, never done any preprocessing before. Thank you in advance!

cziegenhain commented 3 years ago

Hi,

Yes you need to include the generated index.fastq.gz file - it encodes the sample identity from your demultiplexing. The base definition for the index.fastq.gz file should be BC(1-8).

Best, Christoph

roxyisat-rex commented 3 years ago

Hi Christoph

Thanks for the prompt response. A follow up Q, from my understanding, we need to run STAR to generate index on our own first right? And provide the index to the YAML config? Want to confirm. Thank you!

cziegenhain commented 3 years ago

Yes that's exactly right, here is an example command: https://github.com/sdparekh/zUMIs/wiki/Usage#preparing-star-index-for-mapping

roxyisat-rex commented 3 years ago

wow you are fast, haha. Thanks very much!

cziegenhain commented 3 years ago

Closing here assuming your issue was solved.

sdparekh / zUMIs

YAML config for merged demultiplex files #277