legumeinfo / gcv

Federating genomes with love (and synteny derived from functional annotations)
https://gcv.legumeinfo.org/
Apache License 2.0
41 stars 10 forks source link

repeat algorithm struggling to handle large blocks of tandemly repeated genes #162

Closed adf-ncgr closed 4 years ago

adf-ncgr commented 6 years ago

This works as desired: https://legumeinfo.org/lis_context_viewer/#/search/lis/vigun.Vigun11g190400?bintermediate=10&bmask=10&bmatched=20&intermediate=5&matched=40&neighbors=150&sources=lis&algorithm=smith-waterman&gap=-1&match=10&mismatch=-1&score=30&threshold=25

but changing from smith-waterman to repeat for alignment algorithm makes things unhappy (I'm just guessing it is the large block that is causing the issue)

alancleary commented 6 years ago

Although there are optimizations we can apply to the Repeat algorithm, the asymptotic time complexity of the dynamic program will remain the same. Perhaps it's time we move the alignment algorithms to WebAssembly, starting with the Repeat algorithm.

adf-ncgr commented 6 years ago

FWIW, a collaborator ran into this issue yesterday (perhaps for the second time, since it seems to be approximately the same region as mentioned above); I'll note that it isn't the size of the region that is at issue, and intuitively I think it might be less an optimization than perhaps a heuristic to avoid spending cpu on a region that doesn't deserve such careful exploration of the search space. Try this (works fine with smith-waterman): https://legumeinfo.org/lis_context_viewer/#/search/lis/vigun.Vigun11g186300?bintermediate=10&bmask=10&bmatched=20&intermediate=5&matched=10&neighbors=20&sources=lis&algorithm=smith-waterman&gap=-1&match=10&mismatch=-1&score=30&threshold=25

then try switching to repeat algorithm (and watch your browser become angry with you).

not meaning this to distract you from your current work, just filing additional info for later consideration...

alancleary commented 4 years ago

This seems to have been resolved when the repeat algorithm was refactored in commit 8176052b4b63ee2fc6ddc53c3d14cb62f1cf81f1. Closing.