legumeinfo / gcv

Federating genomes with love (and synteny derived from functional annotations)
https://gcv.legumeinfo.org/
Apache License 2.0
41 stars 10 forks source link

sometimes repeat algorithm inverts segments when it seems the non-inversion score must be as good #87

Closed adf-ncgr closed 6 years ago

adf-ncgr commented 7 years ago

e.g. http://laasi.ncgr.org/mt_hapmap/gcv/#/search/mt_hapmap/medtr.HM129.v1.0.g1216?regexp=&neighbors=10&sources=mt_hapmap&alpha=0.5&kappa=10&minsup=2&minsize=5&matched=4&intermediate=5&algorithm=repeat&match=10&mismatch=-1&gap=-1&score=30&threshold=25&order=chromosome

(unless I'm missing something subtle- see image below) can be forced into intuitive compliance by increasing the threshold param or by flipping into smith-waterman mode, but no user wants to learn how to do that! this is going straight to the icebox, just wanted to get it noted before it lapsed into oblivion; I do find the repeat algorithm over-aggressive on occasion, but this case seems like it is just arbitrarily thumbing its nose at William of Occam (which brings to mind a scene from the film Chinatown)

image

alancleary commented 6 years ago

"Fixed" in commit ffc8c4a5332ba667528365cee1b7f1a0b356df48. I use quotes because it's still possible for spurious inversions to occur, just now they're very unlikely.