stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.63k stars 2.7k forks source link

a memory leak case? #848

Open godlockin opened 5 years ago

godlockin commented 5 years ago

I was running an nlp server for most ssplit and some time sentiment analyze service in CentOS7, 64G with xmx&xms 60g

as I need to afford both Chinese end English analysis at the same time, the config as follow: E: annotators=tokenize,ssplit,pos,lemma,ner,parse,coref,sentiment tokenize.language=en C: annotators=tokenize, ssplit, pos, lemma, ner, parse, coref, sentiment and models: `

edu.stanford.nlp stanford-corenlp ${stanfordNLP.version}
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>${stanfordNLP.version}</version>
        <classifier>models</classifier>
    </dependency>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>${stanfordNLP.version}</version>
        <classifier>models-chinese</classifier>
    </dependency>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>${stanfordNLP.version}</version>
        <classifier>models-english</classifier>
    </dependency>

`

And when the server was running with several requests hitting it, it seems the memory nlp server took will be linear increasing and never release till it die.

when the users increased up to 10/s, numbers of connection time out happens, and the memory this service took will be increasing into up to 62G which the total memory all applications can take in this machine is about 62.5G (because of some minor tasks and log agents were running as well)

the exceptions were looping at the same time: org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:333) at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:758) at org.apache.catalina.connector.OutputBuffer.append(OutputBuffer.java:663) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:368) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:346) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:96) at com.fasterxml.jackson.core.json.UTF8JsonGenerator._flushBuffer(UTF8JsonGenerator.java:2085) at com.fasterxml.jackson.core.json.UTF8JsonGenerator.flush(UTF8JsonGenerator.java:1097) at com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:915) ... Caused by: java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:134) at org.apache.tomcat.util.net.NioBlockingSelector.write(NioBlockingSelector.java:101) at org.apache.tomcat.util.net.NioSelectorPool.write(NioSelectorPool.java:157) at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite(NioEndpoint.java:1225) at org.apache.tomcat.util.net.SocketWrapperBase.doWrite(SocketWrapperBase.java:743) at org.apache.tomcat.util.net.SocketWrapperBase.writeBlocking(SocketWrapperBase.java:513) at org.apache.tomcat.util.net.SocketWrapperBase.write(SocketWrapperBase.java:451) at org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.doWrite(Http11OutputBuffer.java:530) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:110) at org.apache.coyote.http11.Http11OutputBuffer.doWrite(Http11OutputBuffer.java:189) at org.apache.coyote.Response.doWrite(Response.java:599) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:328) ... 64 common frames omitted

godlockin commented 5 years ago

I've tried to remove sentiment related model and the memory looks better for now...

gangeli commented 5 years ago

Is this Java memory increasing, or system memory? Does the process get an OutOfMemoryException or killed by Linux's OOM killer?

godlockin commented 5 years ago

actually, it will log numbers of break pipeline before the job meet oom exception, and the clean annotation pool warnings as well, but this is a little wired that it seems not to meet the oom exception before the whole service stuck then.

the clean annotation pool msg was endless printing before the whole node died like

10:19:10.009 [http-nio-8080-exec-596] WARN edu.stanford.nlp.pipeline.StanfordCoreNLP - Clearing CoreNLP annotation pool; this should be unnecessary in production

but as the monitor tool htop's report, no (enough) memory has been released :(

And I'd like to edit my comment before, I disabled the sentiment annotation which makes the memory increasing much slow, but after hours invoked (most the ssplit and ner), the memory this service took will leak as well even the speed is much slower than the sentiment one took.

E.g. to load the whole models into memory will took about 13.6GB for my service as I need to handle some http functions. And, with the same numbers of services' call (50QPS), if I invoke the sentiment annotation, the memory increasing from 13.6 to about 62 will take about 20-30min, while if I only call the ssplit and ner annotations, this process will be about 6-70 hours. But, unfortunately, whether simple tokenization or NER job or the heavy tasks like sentiment will release the memory whole service took until I kill the whole thread (kill -9 ${pid})

furkanozbay commented 4 years ago

I am having this problem too. My webservice does only sentence splitting and tokenizing (tokenize and ssplit) for both English and Chinese. Memory usage is too high (after heavily usage) and it doesn't decrease

AngledLuffa commented 4 years ago

Can you tell us more about the circumstances you are using for running the tokenization? I can't think of any reason why the CRF classifier used for Chinese segmentation would "leak", let alone the tokenizer or sentence splitter routines.

Is it possible you are keeping around the results in some way?

Is it possible the GC is not running for a long time, so the results are effectively kept around forever? For example, if you give the service a large chunk of RAM and then just let it run, it may never have a reason to GC and will just collect a ton of stale objects. What platform are you using, what JRE, what symptoms are you encountering from not having enough memory?

godlockin commented 4 years ago

My online usage for instance, I need a Chinese segment service to do documents' analysis, doc -> sentences, sentence -> phrase.

I built my service in JDK8, maven 3 in this project. GitHub link POM file

Then I start the job by cmd below in my service. java -jar -Xms 16G -Xmx 16G xx.jar

When the concurrency of requests below 10/s, the memory looks ok, but if it raise into 30+ or even in our production env, above 50, this project will take all the free memory in this machine till the system kill this thread.

even when concurrency of requests decrease from 50+ to around 10, the memory this service took won't release it

AngledLuffa commented 4 years ago

Is it using more than the 16G you are allocating it? How much memory is available on the system?

furkanozbay commented 4 years ago

I created a Spring Boot webservice which runs on Docker container. I know it is normal to allocate memory while I am calling the web service but after I stop calling, memory usage is not decreased and stays same.

https://stackoverflow.com/questions/46390659/stanford-corenlp-doesnt-empty-memory-after-running-on-threads After saw the above link, I created a method that calls both GC and StanfordCoreNLP.clearAnnotator method in every 30 min, because the web service is called multiple times in couple of second (so I thought, it might be a thread issue) but this didn't work out too.

After that, I came across to this link; https://github.com/stanfordnlp/CoreNLP/issues/287 and I am thinking about Docker may cause this problem.

AngledLuffa commented 4 years ago

Well, if you can provide any of the heap analysis described in that other linked issue, we can take a further look. Otherwise, I honestly don't know how to improve the situation. Running the segmenter by itself with increasing numbers of threads doesn't cause any appreciable increase in its memory footprint, as far as I can tell, and neither does running it on a particularly long file.

gangeli commented 4 years ago

@AngledLuffa I've definitely seen the memory leak on corenlp.run. My theory -- largely unsubstantiated -- is that my WeakReference shenanigans are broken and annotators aren't actually getting garbage collected as much as they should. I was never able to pin the issue down though :(

AngledLuffa commented 4 years ago

The AnnotatorPool is using a SoftReference instead of a WeakReference for the annotator pool. My understanding is that means the system will keep it alive until it gets close to running out of memory. This leads me to a couple questions:

Another possibility is some kind of cache or string inlining which is getting filled up during normal usage, but my test didn't trigger it because I used the same data over and over.

AngledLuffa commented 2 years ago

heh, i wonder if my change to remove a weak reference map of all the sentences will help with this issue too:

https://nlp.stanford.edu/software/stanford-corenlp-4.5.0b.zip