Open AngledLuffa opened 11 months ago
Hi, do you know of a workaround for this? Returning all the lemmas to which a word can belong.
Yes, actually, we put together a small classifier model for some of the most common cases in the treebanks. We just haven't released it yet. Probably mid to late June
Is this classifier model just a list/index of the most common cases in English?
I was wondering if it's possible to do multilingual lemmatization of single words, returning all possible lemmas, e.g. for "saw" you would get the noun "saw" and the verb "see".
Is something like that feasible at all or would it be better to use a dictionary/lookup approach?
There's not really a way to get back all possible expansions. The seq2seq model doesn't know which POS are possible for new words.
The dictionary does already take into account POS, so your particular "saw" example is already covered. The distinction we will soon fix is "I need to saw this lumber" vs "I saw a pile of lumber"
example:
most common:
's_VERB
can be eitheris
orhas
less likely, but still possible:
wound_VERB
can bewound
orwind
bound
,found
as well