Closed timoei closed 8 years ago
Fixed in commit 35e0e77463e8e0199281a7b741d99dbf2f4028a9
This problem was solved by stripping all non-ASCII characters from the test string before performing any checks. This is now another step of the string normalization process which happens before the actual tag-detection checks
This could have also be solved with an indexOf(PREFIX | SUFFIX) >= 0 command. While the startsWith / endsWIth did sometimes fail, the indexOf statement did still work
The file reader isn't able to read submissions which are encoded in UTF-8 correctly. On the first line of the text file a

is added. For that reason the first tag can't be recognized. See a example submission (utf8Bug) attached. This bug doesn't happen if the submission is encoded in 'UTF-8 without BOM'. But all german 'Umlaute' aren't recognized as well.Possible solution: Before reading encode all submissions to ANSI. utf8Bug.txt