Open antonstenaxel opened 2 years ago
Hey Antonstenaxel, I am curious to know whether the "CONN_MAX_AGE" setting has any influence on this behavior? (https://docs.djangoproject.com/en/3.2/ref/settings/#std:setting-CONN_MAX_AGE). Could the problem perhaps be remedied with this setting?
Note the following caveat from the documentation (https://docs.djangoproject.com/en/3.2/ref/databases/):
The development server creates a new thread for each request it handles, negating the effect of persistent connections. Don’t enable them during development.
Not sure if it applies here.
Thanks for the reply!
I had tried the CONN_MAX_AGE without success, but i appears I had missed the caveat you mentioned, so I tried deploying the app to google app engine and it reduced the response time from the 3 seconds on my local machine to about 250 ms, which obviously is a lot better.
However when running the same query through the standalone django script shown in the original post I get the 100-130 ms response times, which is also what I get when running with the cloud mysql backend, so there seems to be an overhead of 100-150 ms for some reason. When running the query with a line profiler it appears that these functions take up much of that time:
501459 function calls (501184 primitive calls) in 2.816 seconds
Ordered by: internal time
List reduced from 1012 to 100 due to restriction <100>
ncalls tottime percall cumtime percall filename:lineno(function)
2 1.132 0.566 1.132 0.566 {method 'read' of '_io.BufferedReader' objects}
1 1.114 1.114 1.115 1.115 {method 'next_event' of 'grpc._cython.cygrpc.SegregatedCall' objects}
27937 0.190 0.000 0.190 0.000 {built-in method posix.lstat}
17 0.100 0.006 0.100 0.006 {method 'acquire' of '_thread.lock' objects}
27977 0.039 0.000 0.063 0.000 posixpath.py:71(join)
Timing the actual query with django_debug_toolbar shows around 100-150 ms here
Thanks for brining this to our notice, I was also looking to investigate CONN_MAX_AGE for debugging this. Now I know to start there and investigate more, will update here once I make some progress on this.
This is related to Spanner sessions creation (see https://cloud.google.com/spanner/docs/sessions#performance_benefits_of_a_session_cache). To deal with it, we use session pools - to re-use heavy objects, instead of utilizing and recreating.
@vi3k6i5, did you have a chance to look at CONN_MAX_AGE
? I'm pretty sure there is one more variant to fix it - use a global pool object on new connections creation, see this arg of the connect()
function:
https://github.com/googleapis/python-spanner/blob/41015715c9465b8b7753b71350a4be3aaeb26704/google/cloud/spanner_dbapi/connection.py#L525-L527
That's how we fixed a couple of problems in SQLAlchemy dialect.
Description
I am seeing very long response times running my Django app using a spanner backend. For a simple indexed query which takes around 10 ms in the spanner query editor I am seeing around 2.5 -3 seconds running the app locally. When running in production or on a compute engine in the same region as the spanner intance, these times go down a bit but still stays around 800ms. For reference, the equivalent application using a mysql instance running in google cloud sql in the same region takes around 100ms for the same requests.
As far as I can tell it seems to be due to Django creating a new connection to spanner for every request, which seems to be an expensive operation, introducing significant overhead. In python scripts where I am able to keep the connection alive, only the first connection takes this long, and subsequent queries are much faster.
Environment details
Steps to reproduce
python manage.py runserver
Additional info
For endpoints which hits the database multiple time (for instance first for authentication, then later for the data retrieval) it is only the first query which seems to suffer the overhead of creating the new sessions/connections.
I've tried reproducing this in a python script using the following
Which outputs
So it seems to be reasonable response time as long as it is able to reuse a connection, but creating a new one seems very expensive causing the response times to reach around 2-3 seconds.