lutteropp / hakmer-ng-redesign

0 stars 0 forks source link

Use average number of substitutions #46

Closed lutteropp closed 5 years ago

lutteropp commented 5 years ago

alexey [3:29 PM] "Next, we used a set of 32 Roseobacter genomes of 132 mb with a reference tree published by Newton et al. [35]; here the distance between sequence pairs was 0.233 substitutions per position on average"

sounds like a lot, mb here you should really allow mismatches in the seed, or decrease k-mer size...

sarah [3:30 PM] yeah, so far I used a kmer size of 30 and just 1 allowed mismatch, that was definitely not a fitting combination mhmmm... average number of substitutions... sounds like a good way to automatically infer parameters :slightly_smiling_face: (edited)

alexey [3:32 PM] yes that's what I thought as well, i guess you can compute those quickly e.g. using methods from Goettingen people))

lutteropp commented 5 years ago

I don't think we need it for choosing minK anymore, but it sounds very helpful for determining how many mismatches we should allow in a seed

lutteropp commented 5 years ago

We can use the FSWM tool from the Göttingen people for that: http://fswm.gobics.de/

lutteropp commented 5 years ago

Or, we can locally compute the average substitution rate in a block from the exact matching taxa and use this for augmenting the block with further approximate matches.

lutteropp commented 5 years ago

done