hbc / li_hiv

HIV integration project
6 stars 8 forks source link

Bug in read collapsing? #1

Closed tsibley closed 8 years ago

tsibley commented 8 years ago

I believe there's a small bug in part of the read collapsing code:

group_by(read_name) %>% mutate(read_count=n()) %>% filter(mapq == max(mapq)) %>%

The maximum mapq may be shared by more than one line, and so in some cases this permits more than one alignment per read through. I believe there are instances of this in the Sunshine, et al. data.

roryk commented 8 years ago

Hi Thomas,

Awesome catch, and you're totally right. I just reran everything on one of the sets of data and took the average of the number of multiple hits for each read for each site. It looks like for the vast majority of sites, especially the single alignment of evidence sites, it has no effect, but for some sites it can overestimate the evidence a bit.

tsibley commented 8 years ago

Yep, I didn't expect it to have much effect, but figured it was worth pointing out. Thanks for responding so quickly!

roryk commented 8 years ago

Thanks a lot for digging into it and pinging me about it.