trinker / tagger

Part of speech (POS) tagger
11 stars 4 forks source link

Include option for other languages #6

Open stefan-mueller opened 7 years ago

stefan-mueller commented 7 years ago

Thanks for developing this package – it's really helpful. One question: openNLP offers POS taggers for several other languages (http://opennlp.sourceforge.net/models-1.5/).

They can be loaded easily by installing the respective package, for instance for the Dutch model:

install.packages("openNLPmodels.nl", repos = "http://datacube.wu.ac.at/", type = "source")

If I want to use these languages for my POS tagging, in openNLP I simply specify the language for the annotators (see e.g. ?Maxent_POS_Tag_Annotator).

Thus, could we implement this option for tidypos as well? I assume we need to change lines 82 and 83 in the tag_pos.R file to:

PTA <- openNLP::Maxent_POS_Tag_Annotator(language = "en")
WTA <- openNLP::Maxent_Word_Token_Annotator(language = "en")

and add a language option to the tag_pos.R function. We should set English as the default, but makes it possible to change the language.

I hope that these changes would do the job, but I am not absolutely sure whether the language option needs to be included in other parts of the function. If you let me know whether more changes are needed or not (if yes, which ones?), I am happy to make a pull request.

trinker commented 7 years ago

Stefan,

Thanks for your interest in the tagger package. I agree this would be a nice feature.

This is a bigger lift in that it needs to work the same across coreNLP as well. Additionally, some of the other functions rely on English and would need to be upgraded as well. I don't currently have the dev time for this task. If you or others were willing to address these aspects and do a pull request this would be much appreciated.

Tyler

On Fri, Mar 24, 2017 at 10:01 PM, Stefan Müller notifications@github.com wrote:

Thanks for developing this package – it's really helpful. One question: openNLP offers POS taggers for several other languages ( http://opennlp.sourceforge.net/models-1.5/).

They can be loaded easily by installing the respective package, for instance for the Dutch model:

install.packages("openNLPmodels.nl", repos = "http://datacube.wu.ac.at/", type = "source")

If I want to use these languages for my POS tagging, in openNLP I simply specify the language for the annotators (see e.g. ?Maxent_POS_Tag_Annotator).

Thus, could we implement this option for tidypos as well? I assume we need to change lines 82 and 83 in the tag_pos.R https://github.com/trinker/tagger/blob/3e7831c6107f0c2c43c4803d985ba1ba1e5c79b0/R/tag_pos.R#L82 file to:

PTA <- openNLP::Maxent_POS_Tag_Annotator(language = "en")WTA <- openNLP::Maxent_Word_Token_Annotator(language = "en")

and add a language option to the tag_pos.R function. We should set English as the default, but makes it possible to change the language.

I hope that these changes would do the job, but I am not absolutely sure whether the language option needs to be included in other parts of the function. If you let me know whether more changes are needed or not (if yes, which ones?), I am happy to make a pull request.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/trinker/tagger/issues/6, or mute the thread https://github.com/notifications/unsubscribe-auth/ABrnztk01ujDwSN2ZI20BLLprk13mEUJks5rpHVqgaJpZM4Mo7YN .

stefan-mueller commented 7 years ago

Hi Tyler,

Thank you for developing this package and making it so much easier to connect the POS tags to the words. Ok, it makes sense that more functions need to be changed. At the moment I am using spacyr for German and English POS tagging. In the summer or autumn I will need to tag additional languages. I am happy to edit the code and make a PR when I start working on other languages.

I keep you posted. Stefan

trinker commented 7 years ago

I have been looking to push this forward...at the moment I'm unable to get the add on language extensions to work from the command line: https://stanfordnlp.github.io/CoreNLP/human-languages.htmlThe documentation for installing the addons isn't clear where they go or are installed.