hermitdave / FrequencyWords

Repository for Frequency Word List Generator and processed files
MIT License
1.18k stars 556 forks source link

German words #14

Open DiplEng opened 5 years ago

DiplEng commented 5 years ago

For german words it would be really beneficially if they could be written properly -> Nouns are written capitalized. So not "freund" but "Freund".

This would allow this list to be used for spellchecking.

amadeomano commented 5 years ago

True, and also in German the pronouns "sie" and "Sie" have different meanings, so differentiating them in statistics could lead to much better results.

felix-schneider commented 5 years ago

In addition, there are a large number of words that should be spelled with an Umlaut "ä", "ö", "ü" but occur also in the list with the Umlaut replaced with "ae", "oe" and "ue".

This is an acceptable spelling only if the Umlaut is not available for some reason. These days we have Unicode and these spellings should not be considered correct under any circumstances.

BrendanMartin commented 4 years ago

I'm also hoping this can be fixed

hermitdave commented 4 years ago

Let me see if I can hack this bit in @felix-schneider - might have to do a find and replace. The issue with nouns @DiplEng and @amadeomano is knowing how to identify them programmatically. The data is just a sentence - most likely manually created