mcallaghan / text-as-data

13 stars 20 forks source link

Lemmatization in German #9

Open OptimisticSnail opened 1 year ago

OptimisticSnail commented 1 year ago

Hello!

I am trying to lemmatize my German language tokens - any hints on how I could do so? E.g. packages to use (optimally in combination with quanteda)?

I'd greatly appreciate any help!

Sonja

mcallaghan commented 1 year ago

Have you looked through this answer? https://stackoverflow.com/questions/65664123/lemmatization-of-german-words-capital-letters-and-lower-case-letters

The example they give using tokens_wordstem from quanteda looks promising.

texts %>% tokens() %>% tokens_wordstem(language="de")

I have not tested this, so would appreciate any feedback on whether it works.