jdidion / atropos

An NGS read trimming tool that is specific, sensitive, and speedy. (production)
Other
120 stars 15 forks source link

Support UMIs from a third input file #66

Closed chapmanb closed 4 years ago

chapmanb commented 6 years ago

Douglas and John; Thanks for the recent work on adding UMI extraction support to atropos (#61). I'm looking forward to using this along with trimming. Would it be possible to support including UMIs from a third file, as we often have tagging strategies where bcl2fastq produces R1/R2/R3 with the UMI barcode in R2? In this strategy we'd extract the barcode from the R2 file and write R1/R3 as the first and second read, respectively, with this UMI in the name.

Alternatively, we've worked around support for this with fastp using separate tagging runs for R1/R2 and R3/R2 but then we lose the ability to do paired end trimming in the same run. Ideally we'd like to be able to quality, polyG trim and UMI tag in a single run.

Thanks much for considering this.

jdidion commented 6 years ago

Thanks Brad. One question: the intention with v1.2 onward is to require python 3.6, because I'm looking to make some other improvements as well (use of type annotations, use of xphyle for file management). UMI features are slated for 1.2, but if the py3.6 requirement makes this untenable for you, we can look at backporting those features to the 1.1.x branch. We definitely want to support use of Atropos in bcbio.

chapmanb commented 6 years ago

John -- thanks for considering this. Moving to Python 3.6 only isn't a problem at all. bcbio still isn't python 3 compatible (I know, boo) but we install atropos in a separate anaconda environment with python3 and that has 3.6.2 now so it would work without any issues. Thanks again.

jdidion commented 4 years ago

Now implemented in develop branch