Closed roblanf closed 3 years ago
OK, watching another 5 minutes of output scroll past, I can confirm that the only time the filter seems to kick in is when there's more than one most parsimonious placement. So I guess the check for the parsimony cutoff is sitting inside a conditional for >1 parsimony placement.
Would it be better to move it out? I.e. be able to exclude sequences with parsimony >E, regardless of how many placements there are? I'd argue yes, if we're right that there could be issue with long-placement attraction.
Thanks for pointing this out. It seems like it was doing the right thing, i.e. not placing any sample with parsimony >20, but it printed the WARNING message only when the sample also had >1 parsimonious placements. I have fixed this in the latest commit, which would appear in the next release.
Ah cool. Thanks for double checking!
I'm running UShER with -E 20, which should mean that samples with parsimony >20 are not placed.
In some cases (like really awful sequences) it works as expected, and reports that the threshold was exceeded and the sample wasn't placed:
but in others that are closer to the cutoff, there's no such output. So it seems like these are still being placed (it hasn't finished running yet so I can't confirm if they were actually placed. I'll update this issue in a bit when I have that info).
then again, some with modest parsimony scores are excluded:
but it doesn't look like a simple issue with a threshold:
The closest guess I have (backed up by only the data I've managed to swipe here as the output goes past) is that the filter only seems to kick in when there's >1 optimal placement.
My understanding (and I think probably the best behaviour for this setting) was that it shouldn't place any sample with a score > the threshold.
I could just be missing something though, of course.