Optimize Dockerized Specify 7 for M1 macs

maxpatiiuk commented 1 year ago

Some back-end requests take 5 seconds when a lot of requests are sent in bulk (i.e, when the page is loading)

Happens on M1 only. Happens for all endpoints, not just /context/view.json

Can be replicated by calling this snippet in the DevTools console:

requests = [
  "/context/view.json?name=Geography",
  "/context/view.json?name=Locality",
  "/context/view.json?name=CollectingEvent",
  "/context/view.json?name=CollectionObject",
  "/context/view.json?name=Agent",
  "/context/view.json?name=Accession",
  "/context/view.json?name=Borrow",
  "/context/view.json?name=Gift",
  "/context/view.json?name=Loan",
  "/context/view.json?name=Address",
  "/context/view.json?name=Division"
]; spam = ()=>Promise.all(requests.map(url=>fetch(url,{cache:'no-cache'})))

// Then call spam() in the console several times:
spam()
spam()
spam()
spam()
spam()

Then, in the network tab, see how most requests are resolved immediately, but some take 5 seconds (and occasionally, one will take 10 seconds).

The numbers 5 and 10 appear consistently leading me to belive that there might be some sort of deadlock, and 5 seconds is the default timeout.

When testing in code, the handler function for the /context/view.json endpoint is not called until after 5 seconds have passed, which suggests this issue is somewhere in Django/Nginx/Docker.

maxpatiiuk commented 1 year ago

Research topics:

maxpatiiuk commented 1 year ago

ChatGPT 4:

Based on your description, it seems like there might be a bottleneck or a deadlock issue in your application stack (Nginx, Django, Docker, SQL Alchemy, MariaDB) that's causing some requests to take 5 or 10 seconds. Here are some steps to help diagnose and resolve this issue:

Nginx logs and configuration: Review Nginx logs and configuration to identify any potential bottlenecks or issues. Check for errors, connection timeouts, or slow upstream responses in the logs. Ensure that the worker processes, connections, and timeouts are configured properly.
Django logs and settings: Check Django logs for any errors, slow queries, or other issues that may be causing the delay. Ensure that the database connection settings, such as the number of connections and timeouts, are appropriate for your application's needs.
Docker resource limits: Review your Docker configuration to ensure that your containers have sufficient resources (CPU, memory) allocated to handle the load. Insufficient resources can cause performance issues and slow response times.
Middleware and custom code: Inspect your Django middleware and any custom code that's executed before the request reaches the handler function. There might be some logic or processing that's causing the delay. Consider using Django's built-in logging or third-party profiling tools to measure the time spent in each middleware or custom code block.
Network latency or issues: Investigate if there's any network latency or other network-related issues that could be causing the delay. This is particularly relevant since you mentioned the issue occurs specifically on M1 MacBooks. You can use tools like traceroute or ping to check for network latency between the client and the server.
Database connection and query performance: As you're using SQL Alchemy and MariaDB, ensure that the connection pool settings are configured correctly, and review your SQL queries for any performance issues, deadlocks, or long-running transactions.
Operating system and hardware: Ensure that your server's operating system and hardware are appropriately configured and optimized for your application's requirements. If the issue is specific to M1 MacBooks, there might be some compatibility or performance issues related to the ARM architecture that need to be addressed.

By systematically investigating each component in your application stack and addressing any potential issues, you should be able to diagnose and resolve the cause of the slow request times you're experiencing.

maxpatiiuk commented 1 year ago

Surprisingly, this could be an nginx configuration issue.

Docker development composition (with nginx):

https://user-images.githubusercontent.com/40512816/233528221-c8bcd787-a85b-43a3-9f98-7a5254d9d773.mov

VS me following the non-docker installation instructions but inside docker (https://github.com/specify/specify7/#local-installation) (the biggest difference being that I am using djanog development server instead of nginx):

https://user-images.githubusercontent.com/40512816/233528474-45c7a5b4-e272-445d-986a-c5f18253345e.mov

maxpatiiuk commented 1 year ago

Can confim this is an nginx issue.

By adding this to specify7: section of docker-compose.yml:

    ports:
      - "127.0.0.1:8001:8000"

and connecting to http://127.0.0.1:8001/ in the browser, there is no performance issue:

https://user-images.githubusercontent.com/40512816/233529446-26e1544b-11df-4272-9595-d70dc52c525e.mov

This essentially bypasses nginx and connects directly to the django development server

maxpatiiuk commented 1 year ago

@realVinayak had great insight that this could possibly be caused by HTTP 1.1, as it has a limit of 6 simultaneous requests. We should test if this if fixed when the web server is updated to use http2 or even http3 instead - https://github.com/specify/specify7/issues/2608

maxpatiiuk commented 8 months ago

It is ridiculous that it takes 2.6s to retrieve a static file from localhost on a powerful and fast m1!

Screenshot 2024-02-26 at 19 56 17

for reference, 2.6s is the amount of time needed for a signal to go from Earth to the Moon and back, wtf

maxpatiiuk commented 3 months ago

Updated nginx.conf to use HTTP 2, HTTPs and even IPv6 created self signed certificates

The 5s performance issue is still present

The ONLY difference is that on HTTP network tab shows it as stalled for 5s (could be HTTP 1.1 6 concurrent requests limit issue) Screenshot 2024-07-24 at 18 42 56

where as on HTTPs it shows as waiting for server response for 5s:

Screenshot 2024-07-24 at 18 43 39

The above are for localhost. In my /etc/hosts I made local.local be equivalent to localhost. And the result is:

Screenshot 2024-07-24 at 18 43 56

the initial request takes 15s to resolve! for other requests, the 5s bug is still present

After researching more, it might be related to this Docker bug on mac: https://github.com/docker/for-mac/issues/4430

maxpatiiuk commented 3 months ago

replacing these lines: https://github.com/specify/specify7/blob/6a364e00f96d5c81f0853c65b3d12ed0918fc202/nginx.conf#L33-L34

with this:

        #resolver 127.0.0.11 valid=30s;
        set $backend "http://172.18.0.3:8000";

(disables docker's DNS server and hardcoded IP of the specify7 container as seen in docker container inspect specify7-specify7-1)

...did not fix the performance issue

It still could be a DNS related issue as in https://github.com/specify/specify7/issues/2574#issuecomment-1517182091 I sent requests directly to the specify7 container, without nginx container being involved, thus removing the need for proxying requests between containers

next thing to try would be:

use django dev server in development
have django serve the static files instead of nginx
use nginx only for the asset server? maybe not even that as asset server has it's own web server

specify / specify7

Optimize Dockerized Specify 7 for M1 macs #2574