Open jeroen opened 7 years ago
@jeroen I have several in my list.
Here's a parser and tagger based in C++ that could be wrapped in an R package: http://www.cs.cmu.edu/~ark/TurboParser/
I'd be keen to see Dynamic Topic Models (https://github.com/blei-lab/dtm) available in R. It's a major library by David Blei for analysing how topics change over time, an extension of LDA.
👍 to @benmarwick's suggestion of Dynamic Topic Models.
Added bigartm - non bayesian framework for topic modeling. Online, parallel, asynchronous, very flexible. Actively developed.
For those still following this thread: I have wrapped up Compact Language Detector 2 into an R package. Give it a go and let me know if it works: https://github.com/ropensci/cld2#readme
Thanks @jeroen , will do.
Awesome! Im running some tests now.
OK cld2
is on cran now, will do a v1.1
next week. Let's see what else we got here :)
I had a look at dtm but unfortunately the code is too broken to wrap in R. It has all kind of compiler warnings and doesn't build on Windows at all. It also no longer seems actively maintained.
The cld3 package is now on cran as well. Would be fun to see someone who is into text compare cld2
and cld3
on real data.
How about unRTF? https://www.gnu.org/software/unrtf/.
OK here is a wrapper for unrtf: https://github.com/ropensci/unrtf
If people know of any useful C/C++ libs that would be nice to wrap into an R package, I am happy to assist with that!