Problem with letters ü, ö and ä in lession generator/analysis tab

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Use sources with letters like ü ö ä in it (ie german texts)
2. Analyse the session / generate a lession

What is the expected output? What do you see instead?
Words like "Zweckmäßigkeit" look like "Zweckm"

What version of the product are you using? On what operating system?
0.16

Please provide any additional information below.
If you let the program show you only single keys, ü ö ä will show up

Original issue reported on code.google.com by quietdeath@gmail.com on 6 Jan 2009 at 10:39

GoogleCodeExporter commented 9 years ago

Will try to fix this w/ the next release. :) Funny since my alphatbet also has
non-ascii letters (æøå), but I never tried the analysis/generator with them.

Original comment by tristesse on 10 Jan 2009 at 3:27

Changed state: Started

GoogleCodeExporter commented 9 years ago

I have the same problem.

Actually, in this case the program generates two statistics entries. One entry 
containing the first half of the word before the umlaut and/or the sharp s 
ligature (eszett) “ß” or “ẞ” and another entry containing the second 
half of the word. So, for Zweckmäßigkeit you get “Zweckm” and 
“igkeit”. This is really annoying in autogenerated lessons as none of both 
entries is an actual German word.

I’ll try words like “Überfälle”, “tränenüberströmt”, 
“Gehäusegröße” and “Ölüberschussländer” to check how many parts 
are produced.
Should be:
berf, lle
tr, nen, berstr, mt
Geh, usegr, ße
and
l, berschussl, nder

Original comment by albedosh...@gmail.com on 12 Aug 2011 at 9:24

GoogleCodeExporter commented 9 years ago

Now, this is odd.
The program doesn’t have any problem recognizing the special characters in 
the trigram analysis.

My test file contains these words, including some non-German words, because I 
had the suspicion that the lesson generator doesn’t see non-ASCII characters 
at all:

Überfälle tränenüberströmt Gehäusegröße Ölüberschussländer 
Ölüberschußländer Geräteüberhöhung Gefäßüberdehnung Löß süß 
FAẞBIER Øresund Ælfwine Cœr C&A

So, this is what it looks like in the typer (see image “01-Typer.png”).

I mistyped every word in the lesson, to be sure all words are used in the 
lesson generator. But the lesson generator produces this lesson (see image 
“02-Generated lesson.png”).

Now check the word analysis (see image “03-Analysis.png”).

The funny part is that the trigram analysis works perfectly fine (see image 
“04-Trigrams.png”).

Suspicion confirmed ;)

So the word analysis seems to ignore any non-ASCII character in the text which 
obviously leads to erroneous auto-generated lessons and faulty word analyses 
— whereas the letter and trigram analysis work as intended.

Original comment by albedosh...@gmail.com on 12 Aug 2011 at 11:05

Attachments:

01-Typer.png
[02-Generated lesson.png](https://storage.googleapis.com/google-code-attachments/amphetype/issue-11/comment-3/02-Generated lesson.png)
03-Analysis.png
04-Trigrams.png

kgashok / amphetype

Problem with letters ü, ö and ä in lession generator/analysis tab #11