What steps will reproduce the problem?
1. Input documents in a non-English script, e.g. Greek.
2. Run TMT
What is the expected output? What do you see instead?
Mallet doesn't understand where a token starts or stops, so output it just
gibberish. I expect the words to be recognised as they are.
What version of the product are you using? On what operating system?
TMT 1.0 on Mac OS 10.9
Please provide any additional information below.
This is easily fixed by adding a token-regex input field in "Advanced options"
which is handed down to mallet.
Original issue reported on code.google.com by philipp....@googlemail.com on 10 Dec 2013 at 11:10
Original issue reported on code.google.com by
philipp....@googlemail.com
on 10 Dec 2013 at 11:10