Open csis0247 opened 4 years ago
I removed the http component, and the leak is still occurring. There are now 3 clusters of processes per model.
Changes:
pip install -U bert-serving-server[http]==1.9.6
, I used pip install -U bert-serving-server==1.9.6
.-http_port 8000
.Tried to use the vanilla Docker image to start only one model, still memory leak.
However, once we removed external IP from the server, i.e. no access from outside internet, the leak was gone.
Therefore my conclusion was that someone was probing the server, probably at port 5555, and failing (because CPU usage was quite low). Yet the memory leak was real, because despite failing to trigger the server to encode, memory usage was increasing. Maybe the leak arose from handling failed requests?
Sorry I can't help with this but I couldn't help noticing your memory footprints were quite low compared to what they look like on my local dev machine. For English BERT-large it was startup size of 16GB and stable at around 6-7GB. Is the Chinese model much smaller?
Sorry I can't help with this but I couldn't help noticing your memory footprints were quite low compared to what they look like on my local dev machine. For English BERT-large it was startup size of 16GB and stable at around 6-7GB. Is the Chinese model much smaller?
I am using the BERT-Base model which has only 1/3 the parameters. So yes, the Chinese model is much smaller because BERT-Large is not available for Chinese.
Prerequisites
bert-as-service
? Issue observed on both 1.9.1 and 1.9.6.README.md
? Yes and no. I used this command in Docker:README.md
?System information
cat /etc/os-release
http://[......]/status/server?_=[......]
Description
I'm using this command to start the 3 BERT models on Kubernetes with a Docker image based on this example:
and calling the server via:
^ Not that it mattered. NO ONE was calling the server.
Then this issue shows up:
Video showing memory leak
I started 3 BERT models with
bert-serving-start
on Kubernetes in the same pod. Each server spawned 4 clusters of processes: 1x 1.2GB, 2x 100MB, and 1x leaking memory. No client was connected to the server. The 3 processes highlighted in the video were from the 3 models respectively, each leaking in roughly the same rate, eventually rendering the server out of memory.What's odd was that, this severe memory leak was observed on AliCloud, but not on AWS, despite the same codebase, and the same Docker image. The only difference was hardware, and that AliCloud had 8GB of memory, versus AWS having 16GB. However, the servers were already up and running and able to handle requests with no memory issues.
Hardware on AliCloud:
Hardware on AWS: