Closed lutteropp closed 5 years ago
I don't think we need it for choosing minK anymore, but it sounds very helpful for determining how many mismatches we should allow in a seed
We can use the FSWM tool from the Göttingen people for that: http://fswm.gobics.de/
Or, we can locally compute the average substitution rate in a block from the exact matching taxa and use this for augmenting the block with further approximate matches.
done
alexey [3:29 PM] "Next, we used a set of 32 Roseobacter genomes of 132 mb with a reference tree published by Newton et al. [35]; here the distance between sequence pairs was 0.233 substitutions per position on average"
sounds like a lot, mb here you should really allow mismatches in the seed, or decrease k-mer size...
sarah [3:30 PM] yeah, so far I used a kmer size of 30 and just 1 allowed mismatch, that was definitely not a fitting combination mhmmm... average number of substitutions... sounds like a good way to automatically infer parameters :slightly_smiling_face: (edited)
alexey [3:32 PM] yes that's what I thought as well, i guess you can compute those quickly e.g. using methods from Goettingen people))