shendurelab / MPRAflow

A portable, flexible, parallelized tool for complete processing of massively parallel reporter assay data
Apache License 2.0
31 stars 16 forks source link

Count SE BC + UMI data #76

Open zkliesmete opened 1 year ago

zkliesmete commented 1 year ago

Hi, I was wondering whether for the counting part there is a possibility to use SE BC fastq (read1) and supply a UMI fastq (read2)? In addition, a few things are unclear to me regarding the workflow: 1) is UMI collapsing happening per BC-UMI or across BCs per UMI? If the same UMI is associated with multiple BCs, how is this being handled? 2) Does it require a perfect match between the association file BC sequences and the ones observed in read1 fastq? 3) Related to 2), are the base qualities somehow taken into account?

Thank you a lot!