jflanigan / jamr

JAMR Parser and Generator
BSD 2-Clause "Simplified" License
193 stars 50 forks source link

java.lang.ArrayIndexOutOfBoundsException #17

Open Ramesh-X opened 7 years ago

Ramesh-X commented 7 years ago

I'm using to parse a given text using the following command.

scripts/PARSE.sh < ../text.in > ../text.out 2> output_file.err

The model that I was trying to use was LDC2014T12. But I get the following error.

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0 at edu.cmu.lti.nlp.amr.AMRParser$$anonfun$main$3.apply(AMRParser.scala:307) at edu.cmu.lti.nlp.amr.AMRParser$$anonfun$main$3.apply(AMRParser.scala:192) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at edu.cmu.lti.nlp.amr.AMRParser$.main(AMRParser.scala:192) at edu.cmu.lti.nlp.amr.AMRParser.main(AMRParser.scala)

I tried using other models given. But the same error occurred. I tried using scripts/EVAL.sh also. It also gave the same error. Any help..?

Thanks..

bheinzerling commented 7 years ago

The problem is that AMRParser tries to read a tokenization file that doesn't exist. It seems that instead of raising an exception this results in an empty array. This happens in line 169 of AMRParser.scala:

val tokenized = fromFile(options('tokenized).asInstanceOf[String]).getLines/.map(x => x)/.toArray

Trying to access an element of this empty array in line 197 causes an exception which gets handled, but during handling there is another attempted access in line 307, which causes the ArrayIndexOutOfBoundsException.

As a simple workaround in case your input text is already whitespace tokenized, you can replace line 169 with this line, run ./compile again, and everything should work:

val tokenized = input

Alternatively, you could try to run the tokenize script manually and set the --tok environment variable in config.sh

ritwikmishra commented 6 years ago

I followed what @bheinzerling suggested and it worked for parsing. But when I run scripts/ALIGN.sh < output_file2 > aligned_output_file Here output_file2 is output of parsing step. Same error is encountered

 ### Tokenizing ###
panic: swash_fetch got swatch of unexpected bit width, slen=1024, needents=64 at /home/ritwik/JAMR/jamr/tools/cdec/corpus/support/quote-norm.pl line 149, <STDIN> line 1.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
    at edu.cmu.lti.nlp.amr.CorpusTool$$anonfun$main$1.apply(CorpusTool.scala:48)
    at edu.cmu.lti.nlp.amr.CorpusTool$$anonfun$main$1.apply(CorpusTool.scala:43)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at edu.cmu.lti.nlp.amr.CorpusTool$.main(CorpusTool.scala:43)
    at edu.cmu.lti.nlp.amr.CorpusTool.main(CorpusTool.scala)

I went to the line number 40 of CorpusTool.scala and commented the line just like @bheinzerling suggested in case of AMRParser.scala . I added the following lines instead

val input = stdin.getLines.toArray
val tokenized = input

I compiled it. Now the script ALIGN.sh runs without any Exception. And shows this

### Tokenizing ###
panic: swash_fetch got swatch of unexpected bit width, slen=1024, needents=64 at /home/ritwik/JAMR/jamr/tools/cdec/corpus/support/quote-norm.pl line 149, <STDIN> line 1.
 ### Running aligner ###

but it gives nothing as output. The file aligned_output_file is empty .

What can be done? (I am using the pre-trained models-2016.09.18.tgz only) Thanks

ConstantineLignos commented 6 years ago

@ritwikmishra I was experiencing a similar problem and the solution in https://github.com/jflanigan/jamr/issues/16 solved it for me. Just comment out jamr/tools/cdec/corpus/support/quote-norm.pl line 149 to work around the crash, which appears to be a Perl bug, similar to https://rt.perl.org/Public/Bug/Display.html?id=124109.

ritwikmishra commented 6 years ago

@ConstantineLignos I tried what you suggested. And compiled it again. Now output comes

### Tokenizing ###
panic: swash_fetch got swatch of unexpected bit width, slen=1024, needents=64 at /home/ritwik/ATS/jamr/tools/cdec/corpus/support/quote-norm.pl line 149, <STDIN> line 1.
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
    at edu.cmu.lti.nlp.amr.CorpusTool$$anonfun$main$1.apply(CorpusTool.scala:48)
    at edu.cmu.lti.nlp.amr.CorpusTool$$anonfun$main$1.apply(CorpusTool.scala:43)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at edu.cmu.lti.nlp.amr.CorpusTool$.main(CorpusTool.scala:43)
    at edu.cmu.lti.nlp.amr.CorpusTool.main(CorpusTool.scala)

Bdw I am using CAMR parser now, it is working better as per my needs.

calliwen commented 5 years ago

I followed what @bheinzerling suggested and it worked for parsing. But when I run scripts/ALIGN.sh < output_file2 > aligned_output_file Here output_file2 is output of parsing step. Same error is encountered

 ### Tokenizing ###
panic: swash_fetch got swatch of unexpected bit width, slen=1024, needents=64 at /home/ritwik/JAMR/jamr/tools/cdec/corpus/support/quote-norm.pl line 149, <STDIN> line 1.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
  at edu.cmu.lti.nlp.amr.CorpusTool$$anonfun$main$1.apply(CorpusTool.scala:48)
  at edu.cmu.lti.nlp.amr.CorpusTool$$anonfun$main$1.apply(CorpusTool.scala:43)
  at scala.collection.Iterator$class.foreach(Iterator.scala:727)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
  at edu.cmu.lti.nlp.amr.CorpusTool$.main(CorpusTool.scala:43)
  at edu.cmu.lti.nlp.amr.CorpusTool.main(CorpusTool.scala)

I went to the line number 40 of CorpusTool.scala and commented the line just like @bheinzerling suggested in case of AMRParser.scala . I added the following lines instead

val input = stdin.getLines.toArray
val tokenized = input

I compiled it. Now the script ALIGN.sh runs without any Exception. And shows this

### Tokenizing ###
panic: swash_fetch got swatch of unexpected bit width, slen=1024, needents=64 at /home/ritwik/JAMR/jamr/tools/cdec/corpus/support/quote-norm.pl line 149, <STDIN> line 1.
 ### Running aligner ###

but it gives nothing as output. The file aligned_output_file is empty .

What can be done? (I am using the pre-trained models-2016.09.18.tgz only) Thanks

And I do the same thing like you in CorpusTool.scala file. And I got the follow message:

### Tokenizing ###
/Users/gaoyong/jamr/tools/cdec/corpus/support/utf8-normalize.sh: Cannot find ICU uconv (http://site.icu-project.org/) ... falling back to iconv. Quality may suffer.
iconv: conversion from utf8 unsupported
iconv: try 'iconv -l' to get the list of supported encodings
 ### Running aligner ###

So can you solve your problem?

ConstantineLignos commented 5 years ago

@calliwen I don't know how much this helps, but I am now seeing what others are, where commenting out the line of Perl I suggest above is not enough to fix it. We have two otherwise identical machines where one works and the other doesn't, and we haven't been able to sort out the difference.

However, in your case, I think this is the most important error:

panic: swash_fetch got swatch of unexpected bit width, slen=1024, needents=64 at /home/ritwik/JAMR/jamr/tools/cdec/corpus/support/quote-norm.pl line 149, <STDIN> line 1.

If you comment out line 149 of that file, does the problem go away?

calliwen commented 5 years ago

@calliwen I don't know how much this helps, but I am now seeing what others are, where commenting out the line of Perl I suggest above is not enough to fix it. We have two otherwise identical machines where one works and the other doesn't, and we haven't been able to sort out the difference.

However, in your case, I think this is the most important error:

panic: swash_fetch got swatch of unexpected bit width, slen=1024, needents=64 at /home/ritwik/JAMR/jamr/tools/cdec/corpus/support/quote-norm.pl line 149, <STDIN> line 1.

If you comment out line 149 of that file, does the problem go away?

Thanks for the reply. When I use it on MacOS, I got the "ICU uconv" and "utf8 unsupported" error. But When I run it on Linux, I got the above msg output, and solved it with your solution. Thanks a lot.

ConstantineLignos commented 5 years ago

@calliwen Glad it worked! You can probably get a working uconv from homebrew for MacOS. You may have to manually get the executables on your path, see https://apple.stackexchange.com/questions/201590/uconv-on-mac-os-x-anywhere .