nytud / hunlp-GATE

Lang_Hungarian - a GATE plugin containing Hungarian NLP tools as GATE processing resources
GNU General Public License v3.0
8 stars 6 forks source link

OutOfMemoryError #13

Closed DavidNemeskey closed 7 years ago

DavidNemeskey commented 7 years ago

I am trying the process the Webcorpus with hunlp-GATE (QT, HFST, PurePOS). I regularly run into OutOfMemoryErrors. I am running GATE as a server and I send the data 20k-character chunks at a time.

I log the input and the output XML, as well, and when I send to the server the input that supposedly caused the error, it runs without a hitch. This shows that the problem is most likely caused by a memory leak somewhere in GATE or in one of the components I use:

I run the server with -Xmx20g.

This trace seems to point to PurePOS as well, but it can also be a coincidence.

Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:68)
        at java.lang.StringBuilder.<init>(StringBuilder.java:89)
        at org.apache.commons.lang3.tuple.Pair.toString(Pair.java:162)
        at hu.ppke.itk.nlpg.purepos.common.lemma.AbstractLemmaTransformation.toString(AbstractLemmaTransformation.java:64)
        at hu.ppke.itk.nlpg.purepos.common.lemma.AbstractLemmaTransformation.hashCode(AbstractLemmaTransformation.java:79)
        at java.util.HashMap.hash(HashMap.java:338)
        at java.util.HashMap.get(HashMap.java:556)
        at hu.ppke.itk.nlpg.purepos.model.internal.HashSuffixGuesser.getTagProbabilities(HashSuffixGuesser.java:129)
        at hu.ppke.itk.nlpg.purepos.model.internal.HashSuffixGuesser.getTagLogProbabilities(HashSuffixGuesser.java:86)
        at hu.ppke.itk.nlpg.purepos.model.internal.HashSuffixGuesser.getTagLogProbabilities(HashSuffixGuesser.java:44)
        at hu.ppke.itk.nlpg.purepos.MorphTagger.findBestLemma(MorphTagger.java:116)
        at hu.ppke.itk.nlpg.purepos.MorphTagger.merge(MorphTagger.java:70)
        at hu.ppke.itk.nlpg.purepos.POSTagger.tagSentence(POSTagger.java:110)
        at hu.ppke.itk.nlpg.purepos.POSTagger.tagSentence(POSTagger.java:100)
        at hu.nytud.gate.postaggers.Magyarlanc3POSTaggerLemmatizer.execute(Magyarlanc3POSTaggerLemmatizer.java:105)
        at gate.server.RequestHandler.process(RequestHandler.java:128)
        at gate.server.RequestHandler.handle(RequestHandler.java:82)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
        at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
        at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
        at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
DavidNemeskey commented 7 years ago

It seems as if there was an ever-growing HashMap (maybe an unbounded cache) in the code somewhere. more_than_a_thousand_words more_than_a_thousand_words2

DavidNemeskey commented 7 years ago

Seems like the monotonic grows is present even if I only invoke QunToken, so it might well be something inside GATE.

sassbalint commented 7 years ago

@temprimus Could you please take a look at this issue? Do you have any ideas? Thank you.

temprimus commented 7 years ago

I did some profiling with a dummy module and probably found the issue (hopefully this was the only one) and pushed a fix to it.

@DavidNemeskey could you test if it's still "leaking"

DavidNemeskey commented 7 years ago

@temprimus Thanks, that solved it.

sassbalint commented 7 years ago

Thank you, @temprimus. :)