sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
271 stars 67 forks source link

Customize find_pattern #256

Closed vincenthahaut closed 3 years ago

vincenthahaut commented 3 years ago

Hi !

We are currently testing different UMI-TSO in the context of Smart-seq3 (SS3) protocol. I was wondering if it would be possible to use zUMIs to recognise a different TSO pattern (YAML, find_pattern: ) than the one of SS3 ?

I had a quick look at the zUMIs code, and it seems that SS3 pattern is hardcoded in at least fqfilter.pl and zUMIs-dge2.R functions. Would it be possible to replace this by a custom pattern provided in the YAML file ? I tried to modify these directly, but I probably missed one or two as it failed to recognize the pattern afterwards …

If this is not possible, I think it would be a good idea to add a break point if the provided pattern is not the one of SS3. If I am not mistaken, the default behavior when it has a pattern but does not recognize it as SS3 is to consider that all the reads have a UMI without any warning / error message. This means that if someone misspell the SS3 pattern he will end-up with highly overinflated counts.

cziegenhain commented 3 years ago

Hi,

So the original implementation of find_pattern in zUMIs is to discard any reads that do not contain the pattern. That means if you misspell the pattern or use a different TSO sequence, you will simply get those reads only but there is no danger of overinflated counts. If you are changing this in the context of Smart-seq3, you will lose the internal reads. In zUMIs, as you say the pattern for Smart-seq3 is a hard-coded special case, you would need to replace that yourself if another sequence is needed.

Best, Christoph