naobservatory / mgs-workflow

MIT License
3 stars 3 forks source link

Implement fragment-length analysis #57

Open willbradshaw opened 1 week ago

willbradshaw commented 1 week ago

One of two new capabilities I'd like to add to our core pipeline is analysis of fragment lengths (the other being duplication analysis). As with duplication levels, there are several ways of doing this we could try, and it's possible we'll want to do it multiple ways. Will probably want input from @mikemc on the best way to approach this.

mikemc commented 2 days ago

I've done this using two different methods, both using intermediate pipeline outputs to get stats and add them to the HV hits table.

  1. Use the bbmerge output to get whether the read pair merged and the length of the merged read-pair
  2. Use the 9th column of the SAM output to get the length of the alignment.

My code is in the form of a quarto doc using Bioconductor and tidyverse code (and is private to the NAO because it's using non-public data; @harmonbhasin please message me for a link). I imagine what we'd want to do is edit the corresponding (python?) scripts for the HV workflow to use similar logic to add these three fields to the HV table.