Open corneliusroemer opened 2 years ago
Yeah, I absolute get what you mean. I know this is not ideal, and having two thresholds would already be a big improvement. Maybe I will change it that way soon.
On the other hand, I have a (still very vague) concept of probability computations in my head, that would be even more powerful and need no hard thresholds at all. It would also affect the way that breakpoints (and intermissions, if they will still exist) are handled and the way the output is displayed. Maybe that's more like version 2.0 of this tool, nothing for the near future.
I'll keep thinking about it!
PS: That probability stuff might be a lot of hard work, but since working on these probability computatons on my previous project Dystonse that doesn't scare me any more.
I think I know what you mean, something like max likelihood and/or naive Bayes could be applicable here.
Hi Both - I can walk you through over a zoom call what the Delta_(m,n,2) statistic gets you and how it's constructed. It's non-parametric and you won't need to set thresholds. And, a table of the p-values is pre-built (this is the computationally expensive part) so you just look them up as you need them.
If I understand your script correctly, you treat all mutations that are above the user specified threshold identical.
There's room for improvement there.
It would make sense to use two kinds of mutation types for each clade:
Do you know what I mean? One threshold does not suffice for both concepts.
I'll think a bit more about recombinant detection myself - maybe there are further improvements possible. This is an amazing tool already, though!