rnabioco / raer

Characterize A-to-I RNA editing in bulk and single-cell RNA sequencing experiments
https://rnabioco.github.io/raer/
Other
7 stars 2 forks source link

internal function for specialized bam sort #56

Closed jayhesselberth closed 1 year ago

jayhesselberth commented 1 year ago

It would nice if a function were provide to do this internally using Rsamtools::sortBam() instead of needing to use the command line tool.

https://github.com/rnabioco/raer/blob/97ec7f9901e0f95de934e0c698b1bc7df10ffe11/vignettes/single-cell.Rmd#L44-L48

kriemo commented 1 year ago

agreed. Rsamtools::sortBam() is missing a few options that are provided by samtools, including a threads argument. We could write a function based on the Rsamtools one and request to merge it at a later date. https://github.com/Bioconductor/Rsamtools/blob/f0fe8ba66b59fe983affedd4a6d1e1a66f83b3fe/src/io_sam.c#L739

To make it work we'd need to call bam_sort_core_ext rather than the older bam_sort_core.

jayhesselberth commented 1 year ago

Related to https://github.com/Bioconductor/Rsamtools/issues/46

kriemo commented 1 year ago

Bam sorting by tag is now implement in Rsamtools v 2.15.1.

I don't recommend filtering for xf:25 anymore, as this will only keep 1 read per UMI. 10x scRNA-seq libraries produce cDNA fragments from different regions of the same original RNA, so keeping 1 read per UMI will greatly decrease coverage over many valid regions.

I've implement UMI handling to control duplicates per position when processing single libraries using sc_editing(umi_tag = "UB") which is activated by default.