Closed GoogleCodeExporter closed 9 years ago
Hi,
Interesting. When I run on that file, there is an exception from a bug (which I
have fixed), but it is not that exception. That stack trace looks an awful lot
like the caching inside the java builtin Long class is doing funny things --
might it have something to do with your ExecJavaMojo calling things through
reflection?
In any case, I have fixed the big and am running some tests before I release a
fix. 1.1.1 should be out by tomorrow.
Original comment by adpa...@gmail.com
on 9 Aug 2012 at 5:31
Hi,
Thanks for looking into the issue so quickly.
Interesting that you don't see the same exception. I assume that since
berkeleylm in written in Java it should support input encoded in UTF-8. Is
that a fair assumption?
I have tried calling the program through maven (I imported all the source)
and also without using maven at all and see the same exception in both
cases which is a bit odd if it is caused by reflection.
Original comment by hhohw...@shutterstock.com
on 9 Aug 2012 at 5:43
UTF-8 should be fine. Hopefully the fix I've committed will resolve your issue
in any case.
Original comment by adpa...@gmail.com
on 9 Aug 2012 at 7:33
Apologies, I fell asleep on this fix. Version 1.1.1 has been uploaded. Let me
know if this doesn't fix your issue.
Original comment by adpa...@gmail.com
on 13 Aug 2012 at 2:02
I unzipped the new 1.1.1 code but unfortunately am still seeing the same
ArrayIndexOutOfBoundsException. I have tried on a different input data set in
case that was the problem (en-test.txt, attached below) but I see the same
problem on that input.
Here's the steps I took to produce the error:
1. Unzip the code
2. cd to the top level directory, berkeleylm-1.1.1
3. Run ant from the top level directory
4. From the top level directory, run:
java -cp jar/berkeleylm.jar edu.berkeley.nlp.lm.io.MakeKneserNeyArpaFromText 5
test-en.model en-test.txt
5. Output is:
Reading text files [en-test.txt] and writing to file test-en.model {
Reading in ngrams from raw text {
On line 0
} [2s]
Writing Kneser-Ney probabilities {
Counting counts for order 0 {
} [0s]
Counting counts for order 1 {
} [0s]
Counting counts for order 2 {
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 256
at java.lang.Long.valueOf(Long.java:548)
at edu.berkeley.nlp.lm.map.ExplicitWordHashMap$KeyIterator.next(ExplicitWordHashMap.java:140)
at edu.berkeley.nlp.lm.map.ExplicitWordHashMap$KeyIterator.next(ExplicitWordHashMap.java:121)
at edu.berkeley.nlp.lm.collections.Iterators$Transform.next(Iterators.java:107)
at edu.berkeley.nlp.lm.io.KneserNeyLmReaderCallback.parse(KneserNeyLmReaderCallback.java:284)
at edu.berkeley.nlp.lm.io.LmReaders.createKneserNeyLmFromTextFiles(LmReaders.java:299)
at edu.berkeley.nlp.lm.io.MakeKneserNeyArpaFromText.main(MakeKneserNeyArpaFromText.java:57)
Original comment by hhohw...@shutterstock.com
on 15 Aug 2012 at 11:34
Attachments:
Followed your steps and did not encounter any exceptions. I'm guessing this is
a bug in your JVM -- the exception is occurring while boxing a long! You can
try using a different JVM, or even try using -server (which you should do
anyway, for speed).
Original comment by adpa...@gmail.com
on 15 Aug 2012 at 5:10
Thanks again for testing this out. It is quite odd that the error comes from
boxing a long. I ran both with and without -server but saw the exception in
both cases. I'm going to try a different JVM. Would you mind posting the output
you get from running "java -version" so that I can start with that
implementation? I'm using HotSpot 64 bit:
$ java -version
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
Thanks for the help.
Original comment by hhohw...@shutterstock.com
on 15 Aug 2012 at 5:28
$ java -version
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-10M3720)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
Original comment by adpa...@gmail.com
on 15 Aug 2012 at 5:56
I updated my java-6-sun jvm to 1.6.0_34, I was using a version from 2008. I no
longer see the exception. Looks like Oracle has been hard at work fixing
autoboxing issues in the last few years. :)
Original comment by hhohw...@shutterstock.com
on 15 Aug 2012 at 8:58
Original issue reported on code.google.com by
hhohw...@shutterstock.com
on 9 Aug 2012 at 4:48Attachments: