Trying to replicate nominatim.openstreetmap.org

otbutz commented 2 months ago

I'm currently trying to piece together the necessary bits to replicate the behavior of https://nominatim.openstreetmap.org on my own instance.

I want the API to answer on https://example.com/search.php?q= and https://example.com/search?q= while anyone opening https://example.com/ is redirected to https://example.com/ui/search.html

The UI itself and the redirect is working just fine with my current config. The API also properly answers for queries to /search but anything with .php seems to be broken:

upstream nominatim_service {
    server unix:/run/nominatim.sock fail_timeout=0;
}

# Inspect the format parameter in the query arguments. We are interested
# if it is set to html or something else or if it is missing completely.
map $args $format {
    default                  default;
    ~(^|&)format=html(&|$)   html;
    ~(^|&)format=            other;
}

# Determine from the URI and the format parameter above if forwarding is needed.
map $uri/$format $forward_to_ui {
    default               1;   # The default is to forward.
    ~^/ui                 0;   # If the URI point to the UI already, we are done.
    ~/other$              0;   # An explicit non-html format parameter. No forwarding.
    ~/reverse.*/default   0;   # Reverse and lookup assume xml format when
    ~/lookup.*/default    0;   #   no format parameter is given. No forwarding.
}

server {
    listen 80;
    listen [::]:80;

    location / {
            proxy_set_header Host $http_host;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_redirect off;
            proxy_pass http://nominatim_service;
    }

    # Forward to UI by default
    location = / {
        return 301 $scheme://$http_host/ui/search.html;
    }

    location @php {
        # fastcgi stuff..
        if ($forward_to_ui) {
            rewrite ^(/[^/]*) $scheme://$http_host/ui$1.html redirect;
        }
    }

    location ~ [^/]\.php(/|$) {
        # fastcgi stuff..
        if ($forward_to_ui) {
            rewrite (.*).php $scheme://$http_host/ui$1.html redirect;
        }
    }

    location /ui/ {
        alias /opt/nominatim-ui/dist/;
        index index.html;
    }
}

Nominatim 4.4.1 (Python frontend) nominatim-ui 3.5.3

lonvia commented 2 months ago

We simply to proxypasses for the .php files as well.

The nginx configuration for nominatim.openstreetmap.org is available at https://github.com/openstreetmap/chef/blob/master/cookbooks/nominatim/templates/default/nginx.erb. You'd be interested in the api_flavour 'python'.

otbutz commented 2 months ago

Thanks!

The following config works for my usecase:

upstream nominatim_service {
    server unix:/run/nominatim.sock fail_timeout=0;
}

# Inspect the format parameter in the query arguments. We are interested
# if it is set to html or something else or if it is missing completely.
map $args $format {
    default                  default;
    ~(^|&)format=html(&|$)   html;
    ~(^|&)format=            other;
}

# Determine from the URI and the format parameter above if forwarding is needed.
map $uri/$format $forward_to_ui {
    default               1;   # The default is to forward.
    ~^/ui                 0;   # If the URI point to the UI already, we are done.
    ~/other$              0;   # An explicit non-html format parameter. No forwarding.
    ~/reverse.*/default   0;   # Reverse and lookup assume xml format when
    ~/lookup.*/default    0;   #   no format parameter is given. No forwarding.
}

server {
    listen 80;
    listen [::]:80;

    # Forward to UI by default
    location = / {
        return 301 $scheme://$http_host/ui/search.html;
    }

    location / {
            proxy_set_header Host $http_host;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_redirect off;
            proxy_pass http://nominatim_service;
    }

    location @php {
        # fastcgi stuff..
        if ($forward_to_ui) {
            rewrite ^(/[^/]*) $scheme://$http_host/ui$1.html redirect;
        }
        if ($request_method = 'OPTIONS') {
            add_header 'Content-Type' 'text/plain; charset=UTF-8';
            add_header 'Content-Length' 0;
            add_header Access-Control-Allow-Origin "*";
            add_header Access-Control-Allow-Methods 'GET,OPTIONS';
            add_header Access-Control-Allow-Headers $http_access_control_request_headers;
            return 204;
        }
        proxy_set_header Host $http_host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_redirect off;
        proxy_pass http://nominatim_service;
    }

    location /ui/ {
        alias /opt/nominatim-ui/dist/;
        index index.html;
    }
}

otbutz commented 2 months ago

@lonvia just noticed that the chef config isn't using any sort of keepalive or zones. Would you be interested in a pull request?

e.g.

upstream nominatim_service {
    zone upstreams 64K;
    server unix:/run/nominatim.sock fail_timeout=0;
    keepalive 2;
}

location / {
        # ...
        proxy_set_header Connection "";
        proxy_http_version 1.1;
}

This is considered as a configuration mistake by the nginx developers:

https://www.f5.com/company/blog/nginx/avoiding-top-10-nginx-configuration-mistakes#no-keepalives https://www.f5.com/company/blog/nginx/avoiding-top-10-nginx-configuration-mistakes#upstream-groups

mtmail commented 2 months ago

Isn't the connection here a socket/IPC connection? I know keepalive is useful for TCP/HTTP connections where something in the network can go wrong. Maybe I'm misunderstanding the f5 article.

otbutz commented 2 months ago

The overhead of creating a new Unix socket connection is much lower, it's true. But it's not zero.

And you also get the benefits of HTTP/1.1, which is still not the default for nginx upstream connections. Things like chunked transfer encoding are not available with HTTP/1.0.

It might also be beneficial for Gunicorn, since workers might be tied to connections. But that's just speculation on my part.

mtmail commented 2 months ago

I ran 1 minute tests on a very high spec server requesting the /status endpoint. Nominatim master version. At over 20,000 requests/sec there might be a small benefit. I'd say Nominatim admins usually don't have that kind of traffic, or rather the bottleneck will be somewhere else (running database queries)

100 virtual users, before

http_req_blocked......: avg=38.51µs min=632ns   med=2.32µs  max=29.86ms p(90)=3.71µs  p(95)=4.29µs
http_req_connecting...: avg=34.65µs min=0s      med=0s      max=29.1ms  p(90)=0s      p(95)=0s
http_req_duration.....: avg=27.3ms  min=25.95ms med=27.14ms max=59.54ms p(90)=28.38ms p(95)=29.01ms
http_req_receiving....: avg=43.57µs min=10.12µs med=37.06µs max=4.89ms  p(90)=52.37µs p(95)=58.76µs
http_req_sending......: avg=15.6µs  min=3.96µs  med=12.39µs max=2.91ms  p(90)=18.05µs p(95)=21.43µs
http_req_waiting......: avg=27.24ms min=25.88ms med=27.08ms max=59.49ms p(90)=28.32ms p(95)=28.94ms
http_reqs.............: 218,359 3637/s

100 virtual users, after

http_req_blocked......: avg=38.56µs min=622ns   med=2.3µs   max=32.74ms p(90)=3.65µs  p(95)=4.17µs
http_req_connecting...: avg=35.25µs min=0s      med=0s      max=32.67ms p(90)=0s      p(95)=0s
http_req_duration.....: avg=27.49ms min=25.79ms med=27.07ms max=78.13ms p(90)=28.86ms p(95)=29.67ms
http_req_receiving....: avg=41.65µs min=9.82µs  med=36.89µs max=3.19ms  p(90)=51.78µs p(95)=58.21µs
http_req_sending......: avg=14.8µs  min=3.59µs  med=12.29µs max=2.44ms  p(90)=17.8µs  p(95)=20.97µs
http_req_waiting......: avg=27.43ms min=25.75ms med=27.01ms max=78.08ms p(90)=28.8ms  p(95)=29.61ms
http_reqs.............: 216,890 3613/s

500 virtual users, before

http_req_blocked......: avg=41.16µs min=611ns   med=2.24µs  max=61.55ms  p(90)=4.07µs  p(95)=5.11µs
http_req_connecting...: avg=37.92µs min=0s      med=0s      max=61.48ms  p(90)=0s      p(95)=0s
http_req_duration.....: avg=28.94ms min=26.01ms med=28.26ms max=85.36ms  p(90)=31.57ms p(95)=33.28ms
http_req_receiving....: avg=63.7µs  min=10µs    med=34.43µs max=15.91ms  p(90)=54.51µs p(95)=76.71µs
http_req_sending......: avg=23.59µs min=3.49µs  med=11.72µs max=15.56ms  p(90)=19.73µs p(95)=29.47µs
http_req_waiting......: avg=28.86ms min=25.95ms med=28.17ms max=85.31ms  p(90)=31.45ms p(95)=33.15ms
http_reqs.............: 1,029,345 17,144/s

500 virtual users, after

http_req_blocked......: avg=67.59µs min=611ns   med=2.26µs  max=97.83ms  p(90)=4.14µs  p(95)=5.19µs
http_req_connecting...: avg=35.42µs min=0s      med=0s      max=71.27ms  p(90)=0s      p(95)=0s
http_req_duration.....: avg=29.3ms  min=25.8ms  med=28.42ms max=105.96ms p(90)=32.17ms p(95)=34.12ms
http_req_receiving....: avg=74.13µs min=10.34µs med=34.95µs max=19.86ms  p(90)=55.67µs p(95)=84.4µs
http_req_sending......: avg=27.88µs min=3.75µs  med=11.93µs max=19.29ms  p(90)=20.08µs p(95)=33.06µs
http_req_waiting......: avg=29.19ms min=25.76ms med=28.33ms max=105.92ms p(90)=32.02ms p(95)=33.93ms
http_reqs.............: 1,015,916 16,917/s

1000 virtual users, before

http_req_blocked......: avg=45.28µs min=611ns   med=2.27µs  max=84.87ms  p(90)=4.29µs  p(95)=5.29µs
http_req_connecting...: avg=39.32µs min=0s      med=0s      max=59.6ms   p(90)=0s      p(95)=0s
http_req_duration.....: avg=40.35ms min=26.48ms med=38.93ms max=410.55ms p(90)=48.96ms p(95)=53.51ms
http_req_receiving....: avg=75.69µs min=10.5µs  med=34.79µs max=24.86ms  p(90)=57.41µs p(95)=112.79µs
http_req_sending......: avg=28.42µs min=3.88µs  med=11.91µs max=25.34ms  p(90)=20.76µs p(95)=39.07µs
http_req_waiting......: avg=40.25ms min=26.31ms med=38.82ms max=410.37ms p(90)=48.84ms p(95)=53.37ms
http_reqs.............: 1,478,721 24,620/s

1000 virtual users, after

http_req_blocked......: avg=47.09µs min=592ns   med=2.29µs  max=91.19ms  p(90)=4.38µs  p(95)=5.35µs
http_req_connecting...: avg=35.31µs min=0s      med=0s      max=72.52ms  p(90)=0s      p(95)=0s
http_req_duration.....: avg=37.47ms min=26.06ms med=36.25ms max=214.43ms p(90)=44.96ms p(95)=48.78ms
http_req_receiving....: avg=88.51µs min=10.54µs med=35.02µs max=28.09ms  p(90)=59.11µs p(95)=354.31µs
http_req_sending......: avg=35.67µs min=3.74µs  med=11.93µs max=23.35ms  p(90)=21.2µs  p(95)=57.57µs
http_req_waiting......: avg=37.35ms min=26ms    med=36.14ms max=214ms    p(90)=44.81ms p(95)=48.59ms
http_reqs.............: 1,591,360 26,495/s

osm-search / nominatim-ui

Trying to replicate nominatim.openstreetmap.org #256