Closed bhgrant8 closed 4 years ago
Running this in my development machine, I was able to get a successful 200 response:
Though took ~2 minutes for the response to come...
vs. production server, where we do see the expected timeout after 1 minute of waiting (TTFB - Time to First Byte):
which then results in the 504
Now interestingly the log size is relatively same as the cloud watch logs:
api_1 | 172.20.0.1 [07/Sep/2019:23:06:20 +0000] GET /housing2019/v1/api/hmdaorwa/ HTTP/1.1 200 10750 http://localhost:8000/housing2019/v1/schema/ Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36 122.777203
10750
vs. 10757
, which leads me to assume backend database is returning the full response, however django server is taking too long to process/serialize response and timing out.
Continuing to chase down,
noticed the server
in the response header was as follows:
server: awselb/2.0
Which indicates that the 504 is being returned in the aws/load balancer level. vs on a successful connection:
server: gunicorn/19.9.0
this and fact current timeout settings result in successful call locally, it appears that the drop is happening in connection between backend server and elb.
As per AWS Docs:
This maybe effected by current keepalive settings in backend.
We are currently not directly configuring, and as such are accepting the gunicorn default of 2. However the Gunicorn docs themselves, do state that if running behind load balancer, you will want to set higher:
http://docs.gunicorn.org/en/stable/settings.html#keepalive
going to attempt to set to 75 as per: https://serverfault.com/questions/782022/keepalive-setting-for-gunicorn-behind-elb-without-nginx
check if we still get a 504 (going to connect housing project to dev docker for canary test)
At this time, my feeling is that the indexing of the related tables in the underlying PostGres database is causing the slowdown in API response as there are 2 api calls:
since we are not using this endpoint in production, but are using individual lookups, providing the index is the more useful implementation at this time.
going to close out as wont-fix at this time, but let's reopen if needed.
TEAM NAME: Housing 2019 PRIORITY (1-5): 4
DO NOT INCLUDE ANY SECRETS IN THIS REQUEST. IT IS PUBLICLY ACCESSIBLE
Description of issue
Attempting to access this endpoint without any query parameters set will return a 504 timeout error
URL: https://service.civicpdx.org/housing2019/v1/api/hmdaorwa/
Error Message/Logs
Here is the error message returned via swagger ui:
I also looked in cloudwatch logs regarding the response, there seems to be a 200 response logged, which is interesting:
Reproduction Steps
Expected: I get a valid 2xx JSON response with data Actual: I got the 504 html error
Code Snippets
Here is example cURL:
Screenshots/GIF
Here is swagger:
I also attempted via the terminal:
Priority/Impact
Speaking with @nickembrey it appears that this endpoint is not in production, so specific fix will have minimum impacts and has low priority. I am opening issue as maybe a good test case for longer run investigation of services and performance....