Google App Engine / Datastore / Flask / Python app Memory Leak

kkantor commented 4 years ago

google-cloud-datastore==1.15.0

I have built a simple news aggregator site, in which the memory usage of all my App Engine instances keep growing until reaching the limit and therefore being shut down.

I have started to eliminate everything from my app to arrive at a minimal reproducible version. This is what I have now:

from google.cloud import datastore

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "xxxxxxx-b643333298f09.json"
os.environ["GOOGLE_CLOUD_DISABLE_GRPC"] = "True"

app = Flask(__name__)

datastore_client = datastore.Client()

@app.route('/')
def root():

    query = datastore_client.query(kind='source')
    query.order = ['list_sequence']
    sources = query.fetch() 

    for source in sources:
        pass

Stats show a typical saw-tooth pattern: at instance startup, it goes to 190 - 210 Mb, then upon some requests, but NOT ALL requests, memory usage increases by 20 - 30 Mb. (This, by the way, roughly corresponds to the estimated size of the query results, although I cannot be sure this is relevant info.) This keeps happening until it exceeds 512 Mb, when it is shut down. It usually happens at around the 50th - 100th request to "/". (I have left the rest of my app unchanged, but no other requests were made to anything else other than this function during this test.)

Now, if I eliminate the "for" cycle, and only the query remains, the problem goes away, the memory usage remains at 190 Mb flat, no increase even after 100+ requests.

gc.collect() at the end does not help. I have also tried looking at the difference in tracemalloc stats at the beginning and end of the function, I have not found anything useful.

I suspect the issue is outside of my control.

kkantor commented 4 years ago

Additional info: I have changed my app to use the ndb API instead, and the issue is basically gone. Since I haven't changed anything else, I suspect that the issue is really with the datastore API.

UPDATE: Unfortunaltely this comment I made here has since proven to be incorrect. Switching to ndb did NOT eliminate the problem just made it more infrequent, causing me to declare success too early. Now I am really perplexed, I do not know what might be responsible.

tseaver commented 4 years ago

@kkantor Is the body of the for loop in your app actually empty when the leak occurs?

kkantor commented 4 years ago

Yes, only "pass" in there after everything was systematically removed to see what causes the leak. Please also see the update I made to my previous comment. Sorry for this.

kkantor commented 4 years ago

The attached screenshot shows the memory utilization of my instances. Those places where the curve drops to zero are the ones when logs show an instance shut down due to exceeding memory limit.