Don't use hypergeometric model for low-complexity segments, where a segment is "low-complexity" if the number of distinct kmers is less than 75% of the total number of kmers.
For segments with low kmer complexity, we now examine all candidate windows from the L1 stage, even if their maximum predicted ANI cannot be higher than the threshold. This is because the ANI predictions at low-complexity are more noisy.
Since such a small fraction of segments are low-complexity, the mapping stage takes an extra ~10% cpu time now. Alternatively, low complexity mappings can be skipped entirely with --kmer-complexity F
Im closing this for now until I have a good example of it helping and also until I can show that it doesn't introduce additional overlapping alignments.
Don't use hypergeometric model for low-complexity segments, where a segment is "low-complexity" if the number of distinct kmers is less than 75% of the total number of kmers.
For segments with low kmer complexity, we now examine all candidate windows from the L1 stage, even if their maximum predicted ANI cannot be higher than the threshold. This is because the ANI predictions at low-complexity are more noisy.
Since such a small fraction of segments are low-complexity, the mapping stage takes an extra ~10% cpu time now. Alternatively, low complexity mappings can be skipped entirely with
--kmer-complexity F