Open matthewjdenny opened 7 years ago
As far as I know there is no portable and reliable POS tagger which doesn't rely on Java, Python. It will be really big deal if we can join forces and write framework with which we can train POS taggers for different languages. IMHO it even doesn't need to be state-of-the-art. 94-96% accuracy on standard benchmarks will be enough. I even have ticket for that (with reference to algorithm used by spacy), but haven't had time for investigation.
Funny, I have wanted to implement a POS tagger in R/Rcpp for a while now (for example: https://nlp.stanford.edu/pubs/tagging.pdf or the spacy one). I can get funding to do something like this over the summer from my IGERT fellowship, but I need somebody outside of academia to sponsor me. Bat signal to any folks with non-academic affiliations :)
i'd be glad to support a native POS tagger in the koRpus
package. the whole package actually started as a wrapper script for TreeTagger, and then just grew a bit.
Hi All,
I am hoping we can convene the folks working on NLP packages in R for a discussion of the current state of the art, and support for various underlying POS tagging/parsing libraries (spaCy, OpenNLP, CoreNLP, etc.). I tend to rely on Stanford's CoreNLP and other less portable options in my research, and would really like to know more about the current frontier in tagging/parsing in R.
Best, Matt Denny