sstadick / perbase

Per-base per-nucleotide depth analysis
MIT License
115 stars 13 forks source link

Add Forward/Reverse mapping counts #54

Open 25shmeckles opened 2 years ago

25shmeckles commented 2 years ago

samtools mpileup reports forward and reverse mapping counts using uppercase letters (ATCGN) for F-mappings and lowercase (atcgn) for R. Indels have also separate symbols to specify if the event was found on an F or R mapping.

Right now, perbase outputs only one column per base, so it is not possible to determine the mapping orientation. I suggest adding an optional param to explicitly get a more complete report mapping counts on both strands. This is extremely useful to deal with strand biases in the sequencing data.

Example current output (16 T mapped, no orientation info):

REF     POS     REF_BASE    DEPTH   A       C       G       T       N       INS     DEL
chr1    709636  T           16      0       0       0       16      0       0       0

Example suggested output (10 F mapped (T) and 6 R mapped (t)):

REF     POS     REF_BASE    DEPTH   A       a       C       c       G       g       T       t       N       n       INS     DEL     ins     del
chr1    709636  T           16      0       0       0       0       0       0       10      6       0       0       0       0       0       0
sstadick commented 2 years ago

@25shmeckles - I'm sorry I never replied to this / only just saw it! This is a very cool idea that I would be on-board with. I'm not sure when I'll get a chance to take a crack at it, but if you're still interested in / wanted to make a PR I'd be happy to point you to a good starting point in the code.

25shmeckles commented 2 years ago

@sstadick Hi. Sure, I would love to play with it. In theory, it should be enough to look at the bitwise FLAG to determine the alignment orientation. I am still a noob with Rust, but I should be able to come up with something.