jflanigan / jamr

JAMR Parser and Generator
BSD 2-Clause "Simplified" License
193 stars 49 forks source link

Out of memory errors when parsing large files, and alignments for parsed files #32

Open around1991 opened 6 years ago

around1991 commented 6 years ago

Hi guys

I'm trying to parse a file of ~500k lines, and I always get the following error:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at edu.illinois.cs.cogcomp.LbjNer.LbjTagger.NEWord.addTokenToSentence(NEWord.java:156)
        at edu.illinois.cs.cogcomp.LbjNer.ParsingProcessingData.PlainTextReader.parseText(PlainTextReader.java:33)
        at edu.illinois.cs.cogcomp.LbjNer.ParsingProcessingData.PlainTextReader.parsePlainTextFile(PlainTextReader.java:24)
        at edu.illinois.cs.cogcomp.LbjNer.LbjTagger.NETagPlain.tagData(NETagPlain.java:38)
        at edu.illinois.cs.cogcomp.LbjNer.LbjTagger.NerTagger.main(NerTagger.java:21)

Is there a way to avoid the OOM issue without allocating more memory to the JVM?

Also, is it possible to get alignments between a text file and the resulting parsed AMR file without running the align script, especially because the output of JAMR isn't in the format that the align script expects?

Thanks Kris

lidongxing commented 4 years ago

Have you solved it? Thanks.