Open maxpatiiuk opened 1 year ago
Research topics:
ChatGPT 4:
Based on your description, it seems like there might be a bottleneck or a deadlock issue in your application stack (Nginx, Django, Docker, SQL Alchemy, MariaDB) that's causing some requests to take 5 or 10 seconds. Here are some steps to help diagnose and resolve this issue:
Nginx logs and configuration: Review Nginx logs and configuration to identify any potential bottlenecks or issues. Check for errors, connection timeouts, or slow upstream responses in the logs. Ensure that the worker processes, connections, and timeouts are configured properly.
Django logs and settings: Check Django logs for any errors, slow queries, or other issues that may be causing the delay. Ensure that the database connection settings, such as the number of connections and timeouts, are appropriate for your application's needs.
Docker resource limits: Review your Docker configuration to ensure that your containers have sufficient resources (CPU, memory) allocated to handle the load. Insufficient resources can cause performance issues and slow response times.
Middleware and custom code: Inspect your Django middleware and any custom code that's executed before the request reaches the handler function. There might be some logic or processing that's causing the delay. Consider using Django's built-in logging or third-party profiling tools to measure the time spent in each middleware or custom code block.
Network latency or issues:
Investigate if there's any network latency or other network-related issues that could be causing the delay. This is particularly relevant since you mentioned the issue occurs specifically on M1 MacBooks. You can use tools like traceroute
or ping
to check for network latency between the client and the server.
Database connection and query performance: As you're using SQL Alchemy and MariaDB, ensure that the connection pool settings are configured correctly, and review your SQL queries for any performance issues, deadlocks, or long-running transactions.
Operating system and hardware: Ensure that your server's operating system and hardware are appropriately configured and optimized for your application's requirements. If the issue is specific to M1 MacBooks, there might be some compatibility or performance issues related to the ARM architecture that need to be addressed.
By systematically investigating each component in your application stack and addressing any potential issues, you should be able to diagnose and resolve the cause of the slow request times you're experiencing.
Surprisingly, this could be an nginx configuration issue.
Docker development composition (with nginx):
VS me following the non-docker installation instructions but inside docker (https://github.com/specify/specify7/#local-installation) (the biggest difference being that I am using djanog development server instead of nginx):
Can confim this is an nginx issue.
By adding this to specify7:
section of docker-compose.yml
:
ports:
- "127.0.0.1:8001:8000"
and connecting to http://127.0.0.1:8001/ in the browser, there is no performance issue:
This essentially bypasses nginx and connects directly to the django development server
@realVinayak had great insight that this could possibly be caused by HTTP 1.1, as it has a limit of 6 simultaneous requests. We should test if this if fixed when the web server is updated to use http2 or even http3 instead - https://github.com/specify/specify7/issues/2608
It is ridiculous that it takes 2.6s to retrieve a static file from localhost on a powerful and fast m1!
for reference, 2.6s is the amount of time needed for a signal to go from Earth to the Moon and back, wtf
Updated nginx.conf to use HTTP 2, HTTPs and even IPv6 created self signed certificates
The 5s performance issue is still present
The ONLY difference is that on HTTP network tab shows it as stalled for 5s (could be HTTP 1.1 6 concurrent requests limit issue)
where as on HTTPs it shows as waiting for server response for 5s:
The above are for localhost. In my /etc/hosts I made local.local be equivalent to localhost. And the result is:
the initial request takes 15s to resolve! for other requests, the 5s bug is still present
After researching more, it might be related to this Docker bug on mac: https://github.com/docker/for-mac/issues/4430
replacing these lines: https://github.com/specify/specify7/blob/6a364e00f96d5c81f0853c65b3d12ed0918fc202/nginx.conf#L33-L34
with this:
#resolver 127.0.0.11 valid=30s;
set $backend "http://172.18.0.3:8000";
(disables docker's DNS server and hardcoded IP of the specify7 container as seen in docker container inspect specify7-specify7-1
)
...did not fix the performance issue
It still could be a DNS related issue as in https://github.com/specify/specify7/issues/2574#issuecomment-1517182091 I sent requests directly to the specify7 container, without nginx container being involved, thus removing the need for proxying requests between containers
next thing to try would be:
Some back-end requests take 5 seconds when a lot of requests are sent in bulk (i.e, when the page is loading)
Happens on M1 only. Happens for all endpoints, not just
/context/view.json
Can be replicated by calling this snippet in the DevTools console:
Then, in the network tab, see how most requests are resolved immediately, but some take 5 seconds (and occasionally, one will take 10 seconds).
The numbers 5 and 10 appear consistently leading me to belive that there might be some sort of deadlock, and 5 seconds is the default timeout.
When testing in code, the handler function for the
/context/view.json
endpoint is not called until after 5 seconds have passed, which suggests this issue is somewhere in Django/Nginx/Docker.