stephbuon / hansard-shiny

Code for the "Hansard Viewer" web app (a prototype app for applying to future support).
https://shinyviz.smu.edu/shiny/public/hansard-shiny/
MIT License
5 stars 0 forks source link

Implement Log Likelihood #18

Open stephbuon opened 2 years ago

stephbuon commented 2 years ago

I probably need a lower level, C++ implementation of log likelihood.

I found this C++ package (maybe there are better options): http://www.cahillsoftware.com/media/af21843f135214fffff850fffffe417.html

Should I make an R wrapper for it using Rcpp?

Here's the thing though: I'm not sure how to apply the above package to a data frame -- nor do I really understand how to measure log likelihood correctly, so I am afraid I am going to mess something up.

Ideally I would find the right person for this job.

stephbuon commented 2 years ago

@EliasLMann would you be interested in attempting this or nah?

I showed you the TF-IDF measures that we have throughout. We would like more "transparent" measures implemented like log likelihood.

Ideally I would be able to implement log likelihood in real time, like the TF-IDF function.

EliasLMann commented 2 years ago

Yes, definitely. I'll give it a shot starting Sunday night.

stephbuon commented 2 years ago

Thanks, @EliasLMann!

Here's some background: I want to implement this in the same way that I use TF-IDF for collocates or speaker comparison (so the user will be able to select Log Likelihood from the drop down).

I think it should take a data frame as input, and return another data frame, but with log likelihood scores.

A guess a key thing to check before wrapping this, however, is that this implementation of LL is actually fast enough to run in real time on our data.

R has a few log likelihood implementations, but they're all exceptionally slow and they would be unrealistic in a web app.