Closed jayhesselberth closed 1 year ago
agreed. Rsamtools::sortBam()
is missing a few options that are provided by samtools, including a threads argument. We could write a function based on the Rsamtools one and request to merge it at a later date.
https://github.com/Bioconductor/Rsamtools/blob/f0fe8ba66b59fe983affedd4a6d1e1a66f83b3fe/src/io_sam.c#L739
To make it work we'd need to call bam_sort_core_ext rather than the older bam_sort_core.
Bam sorting by tag is now implement in Rsamtools v 2.15.1.
I don't recommend filtering for xf:25
anymore, as this will only keep 1 read per UMI. 10x scRNA-seq libraries produce cDNA fragments from different regions of the same original RNA, so keeping 1 read per UMI will greatly decrease coverage over many valid regions.
I've implement UMI handling to control duplicates per position when processing single libraries using sc_editing(umi_tag = "UB")
which is activated by default.
It would nice if a function were provide to do this internally using
Rsamtools::sortBam()
instead of needing to use the command line tool.https://github.com/rnabioco/raer/blob/97ec7f9901e0f95de934e0c698b1bc7df10ffe11/vignettes/single-cell.Rmd#L44-L48