lutteropp / hakmer-ng-redesign

0 stars 0 forks source link

Improve dealing with paralogs #48

Open lutteropp opened 5 years ago

lutteropp commented 5 years ago

Maybe just delete the taxon which has a paralogy issue from the block?

lutteropp commented 5 years ago

Done, but this slightly worsened the RF-distances for some reason.Probably because now we choose smaller kmer sizes?

lutteropp commented 5 years ago

Normalized RF-distances before the change: g1 V8: 0.5925926 g2 V8: 0.5185185 g3 V8: 0.5555556 m1 V8: 0.0 m2 V8: 0.05882353 m3 V8: 0.05882353 brucella V8: 0.1 roseobacter V8: 0.4482759 w252 V8: 0.125 w2016 V8: 0.0625 bacteria V8: 0.9285714 faba V8: 0.2666667 59 V8: 0.1964286 (edited)

Normalized RF-distances after the change: g1Mat V9: 0.5925926 g2Mat V9: 0.5555556 g3Mat V9: 0.5925926 m1Mat V9: 0.0 m2Mat V9: 0.1176471 m3Mat V9: 0.0 brucellaMat V9: 0.1 ecoliMat V9: 0.3461538 roseobacterMat V9: 0.5517241 w252Mat V9: 0.125 w2016Mat V9: 0.0 bacteriaMat V9: 1.0 fabaMat V9: 0.8 59Mat V9: 0.5357143

lutteropp commented 5 years ago

Maybe we shouldn't allow for a kmer-size that leads to paralogs in some of the taxa? But then, this would be back to before the change.

lutteropp commented 5 years ago

Maybe we need something like "We need the kmer size for the kept non-paralog taxa to be at least X larger than the kmer-size for the paralog taxa that were discarded"