lutteropp / hakmer-ng-redesign

0 stars 0 forks source link

Something is wrong with the supermatrix built for the w252 dataset #60

Closed lutteropp closed 5 years ago

lutteropp commented 5 years ago

Found the reason when trying to open the msa with old RAxML: ERROR: Bad base ($) at site 203901 of sequence 1

lutteropp commented 5 years ago

Already checked to added approximate seed matches, those are fine.

lutteropp commented 5 years ago

Narrowed it down to the exact seeds themselves - why do they contain the dollar sign? Shouldn't all dollar signs be set to false in the presence checker???!!!

lutteropp commented 5 years ago

Checked the entire LCP array, the dollar signs are definitely not parts of any LCP. This is getting weirder and weirder.

lutteropp commented 5 years ago

This is getting interesting, I like this bug :-) Figured out the culprit is the trivialExtensionSimple function which looked so innocent!

lutteropp commented 5 years ago

datasets that seem to be affected by this bug:

lutteropp commented 5 years ago

the bug seems to only occur when doing trivial extension to the left - doing trivial extension to the right seems to be fine

lutteropp commented 5 years ago

nope, apparently it has to do with getAverageSeedSize, not with the simple trivial extension

lutteropp commented 5 years ago

This was a highly interesting bug indeed. The problem was the following (see attached file) leftExtension.pdf