Closed kkantor closed 2 years ago
Additional info: I have changed my app to use the ndb API instead, and the issue is basically gone. Since I haven't changed anything else, I suspect that the issue is really with the datastore API.
UPDATE: Unfortunaltely this comment I made here has since proven to be incorrect. Switching to ndb did NOT eliminate the problem just made it more infrequent, causing me to declare success too early. Now I am really perplexed, I do not know what might be responsible.
@kkantor Is the body of the for
loop in your app actually empty when the leak occurs?
Yes, only "pass" in there after everything was systematically removed to see what causes the leak. Please also see the update I made to my previous comment. Sorry for this.
The attached screenshot shows the memory utilization of my instances. Those places where the curve drops to zero are the ones when logs show an instance shut down due to exceeding memory limit.
I experience the same behavior with the same stack.
I have monitored my app outside of App Engine and I witness no gradual increase in memory usage. This might reveal a more serious problem with the backend stack of App Engine itself. @kkantor can you confirm if this is the case for you ?
I have monitored my app outside of App Engine and I witness no gradual increase in memory usage. This might reveal a more serious problem with the backend stack of App Engine itself. @kkantor can you confirm if this is the case for you ?
No, I have not done that. Unfortunately I don't even know how to. What would be the quickest way?
I used mem_top, I printed the memory usage every x minutes and examined if any of my code was gradually taking more resources. I let my app run for about 10h like this.
Found another similar instance of this: https://stackoverflow.com/questions/33036334/memory-leak-in-google-ndb-library
It's possible this is the same issue as this one from NDB. That memory leak only occurs in the python27
runtime on GAE and not in any other environment. Those having this issue with Datastore, are you having it with the python27
runtime? Or other runtimes as well?
I experienced this issue with python36
without NDB. I used the current python-datastore
1.15. In the comment of a similar issue, OP specified that he gets the same problem using different libraries from google (and async versions) so it might be deeper like in grpcio code. It goes inline with the fact that I couldn't profile this issue with any kind of memory profiler on a local setup. (just like OP)
We've seen success disabling grpc (using version 1.10)
datastore.Client(_use_grpc=False)
There have been a number of fixes related to memory leaks in grpcio
since this issue was filed. If folks are still seeing this issue, can you please upgrade dependencies and if it's still occurring, please file a new issue. Thanks!
google-cloud-datastore==1.15.0
I have built a simple news aggregator site, in which the memory usage of all my App Engine instances keep growing until reaching the limit and therefore being shut down.
I have started to eliminate everything from my app to arrive at a minimal reproducible version. This is what I have now:
Stats show a typical saw-tooth pattern: at instance startup, it goes to 190 - 210 Mb, then upon some requests, but NOT ALL requests, memory usage increases by 20 - 30 Mb. (This, by the way, roughly corresponds to the estimated size of the query results, although I cannot be sure this is relevant info.) This keeps happening until it exceeds 512 Mb, when it is shut down. It usually happens at around the 50th - 100th request to "/". (I have left the rest of my app unchanged, but no other requests were made to anything else other than this function during this test.)
Now, if I eliminate the "for" cycle, and only the query remains, the problem goes away, the memory usage remains at 190 Mb flat, no increase even after 100+ requests.
gc.collect() at the end does not help. I have also tried looking at the difference in tracemalloc stats at the beginning and end of the function, I have not found anything useful.
I suspect the issue is outside of my control.