nicholaschiasson / ngx_upstream_jdomain

An asynchronous domain name resolution module for nginx upstream.
BSD 2-Clause "Simplified" License
102 stars 30 forks source link

Support for additional LB options #71

Open splitice opened 2 years ago

splitice commented 2 years ago

e.g weight, max_fails, fail_timeout

This does raise the question as to how we specify these (equally?) to each server member

holstvoogd commented 2 years ago

+1

I've been using max_fails=0 to prevent all our applications going down when one is misbehaving.

Yes, we have a bit of a weird set up on AWS ECS where we use nginx for ssl termination & caching in front of a ALB that routes to multiple applications. When one application becomes unreachable, nginx would mark the upstream server (i.e. the ALB) as down and server 503's or something for 10s.

Unfortunatly, setting proxy_next_upstream off does not prevent this.

splitice commented 2 years ago

all the options that are on round robin shouldnt be that hard as jdomain extends round robin.

@holstvoogd Could you please explain your proxy_next_upstream comment? From my understanding that should behave the same with jdomain as with a round robin (without any weight, max_fails or fail_timeout).

holstvoogd commented 2 years ago

@splitice It was just a note/observatiuon that proxy_next_upstream none; does not prevent upstreams from being marked as down when they error/timeout. I missed that the first time I looked into how to prevent that behaviour and had to remind myself when looking into this 'issue'.

nicholaschiasson commented 2 years ago

Agree this would be a good addition. I don't know if it's possible since I haven't looked at the roundrobin code in a while, but ideally we could leverage the builtin module to parse any arguments not exposed by jdomain directly. Otherwise, I am not opposed to parsing them manually along with the rest of the arguments.

splitice commented 2 years ago

@nicholaschiasson

By the way do you have any data on this module and how it interacts with other balancers (e.g least_conn)?

What might be required to make jdomain compatible with additional underlying balancers?

nicholaschiasson commented 2 years ago

@splitice

Not exactly what 'data' you'd be interested in, however I do have some (admittedly weak) tests defined which address the various other upstream-context directives, including least_conn and hash. (See t/004.compatibility_nginx.t)

If my memory serves, the way it's implemented is that jdomain, upon initialization, extends the current loadbalancer alg specified for that upstream block or roundrobin as default if none is specified, which basically means that in order to use an underlying LB algorithm other than roundrobin, you just need to declare that before any instance of a jdomain directive inside your upstream block.

splitice commented 2 years ago

In the situation:

upstream upstream_test {
    server 127.0.0.2;
    least_conn;
    jdomain example.com;
}

Assuming example.com expands to 4 servers how does least_conn behave?

5 total hosts each balanced for least_conn?

I did some testing with least_conn nearly a year ago and it didnt seem to be working. Perhaps I made a mistake in my testing. Perhaps I need to repeat that testing.


Regarding option parsing. It looks like we can get most of it just by passing uscf->flags of the underlying lb through (or'ed with anything additional supported by this module) and then building a fake server line from the jdomain line and parsing with ngx_http_upstream_server (instead of allocating our own server). This would potentially have the advantage of better compatibility with 3rd party modules.

Disadvantage of course is that string and array manipulation is ugly. Possibly slightly slower than our current method (configuration time performance).

How I would do it:

  1. take the jdomain ngx_conf_t (we are done with it anyway)
  2. remove options we handled
  3. send the line to ngx_http_upstream_server once for each max_ips
  4. Handle the return as an error
nicholaschiasson commented 2 years ago

Hi @splitice ,

Regarding testing of least_conn, I was taking a look at the test case I defined for it in the repo and I would say it's definitely a weak test. I recall not putting a lot of effort into defining that one since it was not 100% straightforward with the test framework to simulate concurrent requests and I didn't spend the time to figure it out.

The behaviour of the LB the way it's being tested in the repo currently, it does basically the same thing as regular round robin, since the upstream is only getting one connection at a time. In the future it would be great to harden that test case, and perhaps also define a test case for least_time.

splitice commented 8 months ago

BTW I did one of the possible options in #84

To anyone who needs additional options, thats how you add it. The only way that I can see for now is to manually bring each over.

kartikrao commented 7 months ago

+1 This would help us as well, we have a similar setup to @holstvoogd, with NGINX talking to an ALB, a single timeout causes a flood of 502's.

TOLIE-OFFICIAL commented 1 month ago

e.g weight, max_fails, fail_timeout

This does raise the question as to how we specify these (equally?) to each server member

seems that this module do not support other arguements e.g weight, max_fails, fail_timeout, will it support in the future ? thx @nicholaschiasson @splitice

TOLIE-OFFICIAL commented 1 month ago

+1 This would help us as well, we have a similar setup to @holstvoogd, with NGINX talking to an ALB, a single timeout causes a flood of 502's.

do u guys have any solutions?