minio / sidekick

High Performance HTTP Sidecar Load Balancer
GNU Affero General Public License v3.0
546 stars 82 forks source link

feat: support rr_dns lookup host #113

Closed jiuker closed 3 months ago

jiuker commented 4 months ago

feat: support rr_dns lookup host

ramondeklein commented 4 months ago

I may not fully understand it, but the renew() method contains the following line:

DialContext: dialContextWithDNSCache(dnsCache, newProxyDialContext(10*time.Second)),

The dnsCache is a global variable that caches DNS entries. Because it will re-use the DNS cache entry I don't see what this is trying to solve. I also checked the Timeout field of dnscache.Resolver, but it's not the cache eviction time-out, but the time-out for a DNS lookup.

ramondeklein commented 4 months ago

@jiuker Wouldn't it make more sense to fix this in the DNS resolver instead of creating new HTTP transports? I don't know how it works, but maybe you can simply purge the hostname of the failing host from the DNS cache?

jiuker commented 4 months ago

@jiuker Wouldn't it make more sense to fix this in the DNS resolver instead of creating new HTTP transports? I don't know how it works, but maybe you can simply purge the hostname of the failing host from the DNS cache?

Yesh. It can. I have tested. This is do like the pr before. Lets see if this can fix the case @harshavardhana

ramondeklein commented 4 months ago

Yesh. It can. I have tested. This is do like the pr before. Lets see if this can fix the case

Don't think you can reduce even more. :wink:

jiuker commented 4 months ago

Yesh. It can. I have tested. This is do like the pr before. Lets see if this can fix the case

Don't think you can reduce even more. 😉

Yesh. Per client hold it own cache.

ramondeklein commented 4 months ago

Why so complex? Having a single resolver that is refreshed when there are issues should be fine. I don't think it's worth all the code. I liked https://github.com/minio/sidekick/commit/4dc39fb445a8f15214e26de9203dc677158acb9a, because it's simple.

harshavardhana commented 4 months ago

@jiuker what is needed here is not a custom resolver but a way to get inputs for list of backends that you need to monitor. Today customers specify

sidekick [flags] http://minio-cluster{1...4}.example.com/

However the plan is to support a DNS that provides a CNAME list of hosts or A record of IPs like this.

dig cloud.min.io

; <<>> DiG 9.18.24-0ubuntu0.22.04.1-Ubuntu <<>> cloud.min.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37124
;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;cloud.min.io.          IN  A

;; ANSWER SECTION:
cloud.min.io.       287 IN  A   15.15.15.4
cloud.min.io.       287 IN  A   15.15.15.5
cloud.min.io.       287 IN  A   15.15.15.3
cloud.min.io.       287 IN  A   15.15.15.8
cloud.min.io.       287 IN  A   15.15.15.7
cloud.min.io.       287 IN  A   15.15.15.2
cloud.min.io.       287 IN  A   15.15.15.6
cloud.min.io.       287 IN  A   15.15.15.1

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Tue Jul 09 10:43:20 PDT 2024
;; MSG SIZE  rcvd: 169

Now before starting sidekick you must do https://pkg.go.dev/net#LookupHost to get the list of addrs and then use those addrs to register backends.

So ideally a customer would specify just this

./sidekick [flags] http://cloud.min.io

Now we should support SIGHUP to ensure we can do a fresh lookup of cloud.min.io in-case anything changes to support adding new "backends".

This is for situations like customer added a new pool but they are simply updating their DNS entries but not sidekick command line configuration, sidekick simply automatically starts regarding new pool backends as soon as SIGHUP is called.

This PR right now is not doing what we want.

jiuker commented 4 months ago

@ramondeklein @klauspost review this again please.

jiuker commented 4 months ago

@jiuker what is needed here is not a custom resolver but a way to get inputs for list of backends that you need to monitor. Today customers specify

sidekick [flags] http://minio-cluster{1...4}.example.com/

However the plan is to support a DNS that provides a CNAME list of hosts or A record of IPs like this.

dig cloud.min.io

; <<>> DiG 9.18.24-0ubuntu0.22.04.1-Ubuntu <<>> cloud.min.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37124
;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;cloud.min.io.            IN  A

;; ANSWER SECTION:
cloud.min.io.     287 IN  A   15.15.15.4
cloud.min.io.     287 IN  A   15.15.15.5
cloud.min.io.     287 IN  A   15.15.15.3
cloud.min.io.     287 IN  A   15.15.15.8
cloud.min.io.     287 IN  A   15.15.15.7
cloud.min.io.     287 IN  A   15.15.15.2
cloud.min.io.     287 IN  A   15.15.15.6
cloud.min.io.     287 IN  A   15.15.15.1

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Tue Jul 09 10:43:20 PDT 2024
;; MSG SIZE  rcvd: 169

Now before starting sidekick you must do https://pkg.go.dev/net#LookupHost to get the list of addrs and then use those addrs to register backends.

So ideally a customer would specify just this

./sidekick [flags] http://cloud.min.io

Now we should support SIGHUP to ensure we can do a fresh lookup of cloud.min.io in-case anything changes to support adding new "backends".

This is for situations like customer added a new pool but they are simply updating their DNS entries but not sidekick command line configuration, sidekick simply automatically starts regarding new pool backends as soon as SIGHUP is called.

This PR right now is not doing what we want.

┌──────┬───────────────────┬────────┬───────┬──────────┬─────┬─────┬────────────────┬───────────────┬─────────────┬─────────────┐
│ SITE │ HOST              │ STATUS │ CALLS │ FAILURES │ Rx  │ Tx  │ TOTAL DOWNTIME │ LAST DOWNTIME │ MIN LATENCY │ MAX LATENCY │
│ 1st  │ http://15.15.15.8 │ DOWN   │ 0     │ 0        │ 0 B │ 0 B │ 0s             │ 0s            │ 0s          │ 0s          │
│ 1st  │ http://15.15.15.2 │ DOWN   │ 0     │ 0        │ 0 B │ 0 B │ 0s             │ 0s            │ 0s          │ 0s          │
│ 1st  │ http://15.15.15.6 │ DOWN   │ 0     │ 0        │ 0 B │ 0 B │ 0s             │ 0s            │ 0s          │ 0s          │
│ 1st  │ http://15.15.15.7 │ DOWN   │ 0     │ 0        │ 0 B │ 0 B │ 0s             │ 0s            │ 0s          │ 0s          │
│ 1st  │ http://15.15.15.1 │ DOWN   │ 0     │ 0        │ 0 B │ 0 B │ 0s             │ 0s            │ 0s          │ 0s          │
│ 1st  │ http://15.15.15.4 │ DOWN   │ 0     │ 0        │ 0 B │ 0 B │ 0s             │ 0s            │ 0s          │ 0s          │
│ 1st  │ http://15.15.15.3 │ DOWN   │ 0     │ 0        │ 0 B │ 0 B │ 0s             │ 0s            │ 0s          │ 0s          │
│ 1st  │ http://15.15.15.5 │ DOWN   │ 0     │ 0        │ 0 B │ 0 B │ 0s             │ 0s            │ 0s          │ 0s          │

@harshavardhana

ramondeklein commented 4 months ago

Before I review the code, I want to understand the requested functionality. As I understand there are two ways to configure sidekick:

  1. The existing way using the ellipsis notation, such as sidekick [flags] http://minio-cluster{1...4}.example.com/.
  2. The new method where a single DNS name resolves to multiple IP addresses, such as sidekick [flags] http://minio-cluster.example.com/ (hostname minio-cluster.example.com can resolve to multiple IP addresses). This method could be used to support a round robin DNS (I guess that's meant with "rr_dns") that will serve a list of IP addresses in "random" order to ensure that all servers are handling requests.

I think we need a little different implementation to support this feature. There should be one go-routine (that runs every 5 minutes or when forced using SIGHUP) that determines all the IP addresses of the specified endpoints, so:

  1. minio-cluster{1...4}.example.com will probably result in 4 IP addresses (all having a unique hostname). Sidekick will create 4 back-ends.
  2. minio-cluster.example.com will result in X IP addresses (let's assume 8 while all using the same hostname). Sidekick will create 8 backends.

I'm not sure if we should also support round-robin DNS look-up for the ellipsis notation or always use the first IP that is presented (@harshavardhana what do you think?)

Sidekick will create a custom http.Client per back-end that always dials the IP address that is associated with that backend and uses the hostname for that backend (in case of TLS) to verify the server certificate. Each time it modifies this list, then it will swap out the map atomically, so no locking is needed. Health checks and normal behavior won't be altered.

harshavardhana commented 4 months ago

I'm not sure if we should also support round-robin DNS look-up for the ellipsis notation or always use the first IP that is presented (@harshavardhana what do you think?)

If you have a single arg RR-DNS based, multiple host input is supported; if you specify more than one arg either via ellipses or multiple separate hosts, we do not invoke this code.

harshavardhana commented 4 months ago

@jiuker there are some UI changes we need will keep you posted

jiuker commented 4 months ago

@ramondeklein review again plz