Open DavidNemeskey opened 7 years ago
The full example. I first ran it through quntoken (quntoken qterror.txt
), and parsed the non-ws tokens from it. The resulting file is qterror.tokens.txt. Then I ran hfst-lookup on it, as described above, and no errors. I then tried it with GATE, and got the aforementioned problems. I also printed all tokens sent to HFST-Wrapper, and it is exactly the same as qterror.tokens.txt. So the error must be in the wrapper somewhere.
I get
IOException
s (more oftenIO Exception
-- I guess it depends on where the error occurs, i.e. enough words are written to thestdin
of the dead process) for some input to the HFSTAnalyzer
module.Example output:
Example input from the Hungarian Webcorpus: ioexception.input.txt
The culprit is the very long token Pécs-Nagykanizsa-Graz-Aussee-Ischl-Salzburg-Zürich-Luzern-Rigire-Zürich-München-Linz-Bécs-Győr-Mohács-Pécs, but presumable other inputs could induce the error as well. What is strange is that if I run
hfst-lookup
with the same parameters it is run by GATE:, it is processed without a hitch.