machinalis / iepy

Information Extraction in Python
BSD 3-Clause "New" or "Revised" License
906 stars 186 forks source link

Document installation of additional languages #100

Closed sweh closed 8 years ago

sweh commented 8 years ago

The documentation notes, one should "check Stanford Core NLP documentation and files to download more language packages".

I need to get german language support into iepy. So I downloaded http://nlp.stanford.edu/software/stanford-german-2016-01-19-models.jar, but its unclear to me what I need to do with that jar file. Language files are stored under ~/nltk_data/ afaik, but in a different format.

Can you explain here and/or in the documentation, what needs to be done to get different language support than English or Spanish?

francolq commented 8 years ago

Language files for Stanford CoreNLP are saved in ~/.config/iepy (not in nltk_data becayuse NLTK has nothing to do here).

You will also have to edit corenlp.py and write german specific code like this one for spanish: https://github.com/machinalis/iepy/blob/develop/iepy/preprocess/corenlp.py#L93

Hope this helps.

sweh commented 8 years ago

Thanks, I will look into this and send a pull request if I'm successful.

j0hn commented 8 years ago

It might be a good idea to add that info to the documentation. But for the time being i'll close this since we received and accepted the pull request.

Thanks for the support!

btw, congrats for getting ticket #100, If you ever come to Argentina contact me and i'll buy you a beer for it.