Out of memory errors when parsing large files, and alignments for parsed files

Hi guys

I'm trying to parse a file of ~500k lines, and I always get the following error:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at edu.illinois.cs.cogcomp.LbjNer.LbjTagger.NEWord.addTokenToSentence(NEWord.java:156)
        at edu.illinois.cs.cogcomp.LbjNer.ParsingProcessingData.PlainTextReader.parseText(PlainTextReader.java:33)
        at edu.illinois.cs.cogcomp.LbjNer.ParsingProcessingData.PlainTextReader.parsePlainTextFile(PlainTextReader.java:24)
        at edu.illinois.cs.cogcomp.LbjNer.LbjTagger.NETagPlain.tagData(NETagPlain.java:38)
        at edu.illinois.cs.cogcomp.LbjNer.LbjTagger.NerTagger.main(NerTagger.java:21)

Is there a way to avoid the OOM issue without allocating more memory to the JVM?

Also, is it possible to get alignments between a text file and the resulting parsed AMR file without running the align script, especially because the output of JAMR isn't in the format that the align script expects?

Thanks Kris

jflanigan / jamr

Out of memory errors when parsing large files, and alignments for parsed files #32