Closed GoogleCodeExporter closed 8 years ago
Hi,
The same phenomenon was occurred in my env.
- lucene-gosen: 1.2(ipadic), Java: 1.7.0-1.
- Intel Core i7-2640M @ 2.80GHz, RAM:8.0GB.
I did an investigation and found that:
1) this is associated with Viterbi.java;
2) the phenomenon can be reproduced only when the value of
lNode.rcAttr2 > 0;
3) the first for-loop of the method "calculateConnectionCosts" run
over a million times just before the program getting out of memory.
I guess that there occurred a problem when applying "trigram" rules of
morph analysis written in "connection.csv".
Especially when the input string has a kind of form which force the program
to check trigram rules consecutively term by term.
Regarding "くよくよ...", Gosen checks if the input string matches to the
pattern "よく/形容詞,連用テ接続+ term2/pos2 + term3/pos3".
(e.g. "よく/は/無い")
"よく/形容詞,連用テ接続" is used as the first term of the rule.
regards,
Mitsuharu Makita
Original comment by makita.m...@gmail.com
on 8 Dec 2011 at 10:37
Sorry for slow reply.
Thanks for reporting and investigation.
I think that some are following as provisional correspondence.
- Check the input string on the client side, and divide it into space.
- Providing some kind of internal limiter lucene-gosen.
Original comment by johtani
on 12 Dec 2011 at 2:55
It is a still incomplete patch.
Although the problem of relevance was lost, a part of existing test does not
pass.
As a result of a test, since a score differs from an old analysis result, it
becomes an error.
And it's necessary to test to much more data.
Other bugs may lurk.
Original comment by johtani
on 12 Dec 2011 at 7:20
Attachments:
Original comment by johtani
on 14 Dec 2011 at 6:51
Sorry, Comment#3 patch include bug...
In line 15, the first condition and the second are reverse.
Now analyzing this bug...
It seems to be the bug which occurred at the time of movement of a loop....
Original comment by johtani
on 16 Dec 2011 at 8:05
Commit r158 in trunk and branch 1.2.
Testcase all OK.
The rest is tested by extensive data, and if satisfactory, it will be released.
Original comment by johtani
on 19 Dec 2011 at 7:37
Original comment by johtani
on 20 Dec 2011 at 9:14
release 1.2.1
Original comment by johtani
on 20 Dec 2011 at 9:15
Original issue reported on code.google.com by
haruyama...@gmail.com
on 7 Dec 2011 at 5:20