ropensci / textworkshop17

Text Workshop at the London School of Economics, April 2017
21 stars 7 forks source link

Parsing/POS Tagging in R #1

Open matthewjdenny opened 7 years ago

matthewjdenny commented 7 years ago

Hi All,

I am hoping we can convene the folks working on NLP packages in R for a discussion of the current state of the art, and support for various underlying POS tagging/parsing libraries (spaCy, OpenNLP, CoreNLP, etc.). I tend to rely on Stanford's CoreNLP and other less portable options in my research, and would really like to know more about the current frontier in tagging/parsing in R.

Best, Matt Denny

dselivanov commented 7 years ago

As far as I know there is no portable and reliable POS tagger which doesn't rely on Java, Python. It will be really big deal if we can join forces and write framework with which we can train POS taggers for different languages. IMHO it even doesn't need to be state-of-the-art. 94-96% accuracy on standard benchmarks will be enough. I even have ticket for that (with reference to algorithm used by spacy), but haven't had time for investigation.

matthewjdenny commented 7 years ago

Funny, I have wanted to implement a POS tagger in R/Rcpp for a while now (for example: https://nlp.stanford.edu/pubs/tagging.pdf or the spacy one). I can get funding to do something like this over the summer from my IGERT fellowship, but I need somebody outside of academia to sponsor me. Bat signal to any folks with non-academic affiliations :)

unDocUMeantIt commented 7 years ago

i'd be glad to support a native POS tagger in the koRpus package. the whole package actually started as a wrapper script for TreeTagger, and then just grew a bit.