Closed hsriniva11 closed 3 years ago
I typically trim Tn5 adapters during STAR mapping by passing '--clip3pAdapterSeq CTGTCTCTTATACACATCT' in the STAR parameters section of the yaml file.
That is in STAR, I want to trim the 5' end so it can find the 8bp-UMI sequence properly even when ATTGCGCAATG is not at the beginning of the sequence
Sorry this sounds a bit cryptic to me you will need to describe properly what you mean and what you are planning to do. In the case of smartseq3, the UMI will always be preceded by the occurence of the pattern recognition sequence of the TSO oligo and zUMIs will take care of it appropriately. You can also set a number of mismatches you are willing to tolerate (see changelog of v2.9.5 https://github.com/sdparekh/zUMIs#changelog). Best, Christoph
The issue isn't the trimming here, I'm not able to input fastq files of uneven length to run Smart-Seq3 with zUMIs, is there any way I can do that?
I'd be able to help better if you were a bit more clear here. If only the cDNA portion is of variable length, that is supported in zUMIs and you just set the cDNA range to the full read length (eg. cDNA(23-150) ) but you need go make sure that all reads would have at least 24 bases after your trimming, the length of the cDNA portion cannot become 0.
Here's a screenshot of the library structure
These are the results from an adapter trimmed program for my FASTQ files
---------------------
First read: Adapter 1
---------------------
Sequence Type Length Trimmed (x)
------------------- ---------- ------ -----------
AGATGTGTATAAGAGACAG regular 5' 19 91,616
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1
As you can see, for ~90k reads, the fastq starts with AGATGTGTATAAGAGACAG rather than ATTGCGCAATG or cDNA sequence. To handle this, I use a trimmer. Does this help?
As you can imagine, I'm a slightly familiar with the library structure for Smartseq3 ;) Is this 90k reads out of many millions of reads?
Since this part of the sequence should be part of the Illumina sequencing primer for read 1, it's not expected that reads start with this sequence. It would point to an issue in the library or sequencing (eg. Concatamer). Thus I wouldn't recommend to include such reads in the UMI counting as they might be artefacts.
Thanks for your timely help!
I'm currently running Smart-Seq3 samples, I was wondering if there is any way to input trimmed fastq files (after trimming the mosaic end sequence)