Closed GoogleCodeExporter closed 9 years ago
After several tests it seems that the error also occurs with the LV profile.
See enclosed the Wiki abstract for LV.
Original comment by zygolech...@gmail.com
on 29 Aug 2011 at 2:15
Attachments:
DetectorFactory.loadProfile is to load all profiles at once only ( I had
intended to write check code for that, but there are no such code... ).
So if your code call loadProfile multiple times, try to put all profiles in one
directory and call loadProfile once only.
Thanks.
Original comment by nakatani.shuyo
on 30 Aug 2011 at 3:20
It is already what I have done. The code DetectorFactory.loadProfile(...) is
called only once at the beginning of the processing (if I don't do this, I get
the LangDetectException("duplicate the same language profile")).
I think there is a problem with the LT or LV profiles. Because when I remove
these profiles from the profiles directory, there is no problem at all.
After having decompiled the JAR you provided (to have the right line mentioned
by the exception) it seems that the error occurs on the line:
double prob = ((Integer)profile.freq.get(word)).doubleValue() / profile.n_words[(word.length() - 1)];
(certainly because the word.length() is equal to 0 (zero)).
After investigating, it seems that the word (with frequency) ["":23] was the
problem. I have removed this from the LV profile and it seems to work. Do you
know why it came? Perhaps it is a good idea to test the length of the word?
Indeed, now every EU language have a working profile. Do you want me to send
the MT, SL, ET, LT and LV profiles in order to include these in your next
release?
Regards,
Emmanuel
Original comment by zygolech...@gmail.com
on 30 Aug 2011 at 7:51
I see...
It is probably a bug of genprofile to generate ["":23]-like feature.
I'll fix it on the next update.
Very thanks!
> Do you want me to send the MT, SL, ET, LT and LV profiles in order to include
these in your next release?
My policy is to provided the profiles only which is verified with test data and
I didn't have test data for such languages, so I couldn't provide them...
But I am preparing test data of some languages(LT and so on).
Original comment by nakatani.shuyo
on 30 Aug 2011 at 9:45
Your policy is the right one :-)
Regards,
Emmanuel
Original comment by zygolech...@gmail.com
on 30 Aug 2011 at 10:00
This issue was closed by revision r100.
Original comment by nakatani.shuyo
on 8 Sep 2011 at 10:27
This issue was closed by revision 28880cd7672f.
Original comment by nakatani.shuyo
on 12 Jan 2012 at 9:47
Original issue reported on code.google.com by
zygolech...@gmail.com
on 29 Aug 2011 at 9:13Attachments: