Closed EricR86 closed 3 years ago
I believe we discussed merge without review until 24 Nov. Let me know if you need me to review anyway.
This is more for record keeping of changes and how/why they were made. It's possible that the long description should have gone into the commit message and the commits put on a separate develop branch (for regression tests) instead.
This PR should fix the
--resolution
option to work with posterior output. Notably, theposterior.code
output did work before since it relied on thebed_write
function.To establish a test and ground truth for this change, the
data
testcase had to be changed to a resolution >1bp, and it was changed to 10 in my particular case. Other testcases didn't have sufficiently different data to detect this problem otherwise. The posterior output probability list (inprobs
) was written out to file for every GMTK frame (which was 10bp long sections) as a BED file at 10 bp intervals for every datapoint. To preserve the previous run length encoding, this established reference was merged with bedTools with the commandgroupBy -i new_baseline.0.bed -g 1,4 -c 2,3 -o min,max | awk -v OFS="\t" '{print $1,$3,$4,$2}' > posterior0.0.merged.bed
.The code changes were then compared against this established baseline and the start and end coordinates were verified with the viterbi output and also ensured no probabilities/datapoints were missing.
There is a consideration to change the data testcase to 10bp resolution since it also runs posterior already.