own-pt / sensetion.el

Emacs word-sense annotation interface
GNU General Public License v3.0
4 stars 2 forks source link

corpus processing #107

Open odanoburu opened 5 years ago

odanoburu commented 5 years ago

how should the input for this tool normally be processed? we need it to be at least tokenized and lemmatized; the identification of MWEs would also be of interest.

arademaker commented 5 years ago

I don't know the support of NLTK, but since you mentioned, it can be an alternative. Another one can be Freeling.

arademaker commented 5 years ago

We need to test, take a corpus, produce some output to discuss further.

odanoburu commented 5 years ago

first attempt: use NLTK + pydelphin