stephbuon / hansard-shiny

Code for the "Hansard Viewer" web app (a prototype app for applying to future support).
https://shinyviz.smu.edu/shiny/public/hansard-shiny/
MIT License
5 stars 0 forks source link

Improve speed of KWIC results return time #60

Open stephbuon opened 2 years ago

stephbuon commented 2 years ago

Go to Language/Word Context. Then select "similarity" from the measure drop down. Then search for a word in the corpus. The app will return a scatter plot for word most associated to the search word (according to word2vec and cosign similarity. see line 102). If you click on one of those scatter plot points and wait for ~9 seconds a data frame will pop up with the word's keyword in context (KWIC).

Obviously, it's a problem that it takes ~9 seconds for results to return. Can we optimize KWIC so it returns results in a reasonable amount of time?

Here's the KWIC code: https://github.com/stephbuon/hansard-shiny/tree/main/app/modules/kwic

It's called by: https://github.com/stephbuon/hansard-shiny/blob/main/app/modules/word-context/word_context.R

Caching the results (kwick_cache.R) obviously allows us to return results in real time, however, I don't know if we would generate too much cache.

You'll see that I am borrowing a function from Quanteda (this one: https://quanteda.io/reference/kwic.html)

stephbuon commented 2 years ago

@EliasLMann here is another first problem you can work on if you do not want to work on Log Likelihood.