skynetservices / skydns1

DNS for skynet or any other service discovery
MIT License
528 stars 54 forks source link

Cluster doesn't work on distributed setup with high latency between peers. #89

Closed ghost closed 10 years ago

ghost commented 10 years ago

When i create a cluster with 3 nodes following the next steps:

1- up master node in network A. 2- up a second node in the same network, join it to the cluster. (at this point every thing is fine) 3- up third node in a remote network (with high latency, via VPN) and join it to the cluster.

At this point the three nodes are connected to the cluster, but looks like the two closed nodes fight to become master. Showing this log: ?raft:nop 10 ?raft:nop 10 ?raft:nop 10 ...

And when try to create a new entry with a CURL -X PUT ... doesn't work showing "EOF"

Thanks on advanced.

crosbymichael commented 10 years ago

@bketelsen @miekg @erikstmartin

What do you think of just making skydns an http api for inserting DNS records and the DNS resolver then having configurable backends like etcd and redis to store the service data? I'd be interested in doing this but it would have to be a breaking change to strip out alot of the code that does not have to do with resolving and storing the data.

bketelsen commented 10 years ago

In general I think we're all behind it. @miekg was working on an Etcd interface, but I'm sure he's busy with his new work. If you've got the time, go for it! It would allow SkyDNS to do its job and get us out of the big distributed debugging game.

crosbymichael commented 10 years ago

@bketelsen that is why I though this issue would be a good one to bring the discuss up.

Do you think skydns should act at the HTTP interface to insert services or would we use direct bindings to etcd or redis in our client code? I am for the HTTP interface to insert, remove, and update records because it provides a consistent UI no matter the backend you are using.

bketelsen commented 10 years ago

I think the client should update SkyDNS, and SkyDNS should have pluggable backends. Then the client registration is fixed and the DNS operates as usual. No client code needs to change for a different data store.

miekg commented 10 years ago

[ Quoting notifications@github.com in "Re: [skydns] Cluster doesn't work o..." ]

@bketelsen @miekg @erikstmartin

What do you think of just making skydns an http api for inserting DNS records and the DNS resolver then having configurable backends like etcd and redis to store the service data? I'd be interested in doing this but it would have to be a breaking change to strip out alot of the code that does not have to do with resolving and storing the data.

Go for it!

I have a private etcd branch where I'm playing with skydns+etcd, but I haven't got that far (swamped with work and private live stuff).

There is so much code that can be removed from skydns that I'm still torn between:

  1. strip skydns to use etcd
  2. go from etcd and add a dns layer

/Miek

Miek Gieben

bketelsen commented 10 years ago

well, for what it's worth the CoreOS folks would welcome a patch to etcd that added DNS as a /mod/ directory application. We've already had the discussion. @miekg I know you know this already.

miekg commented 10 years ago

[ Quoting notifications@github.com in "Re: [skydns] Cluster doesn't work o..." ]

well, for what it's worth the CoreOS folks would welcome a patch to etcd that added DNS as a /mod/ directory application. We've already had the discussion.
@miekg I know you know this already.

yes, of course, but it would be nice to keep existing use-cases like skydock, if that criteria becomes less important, creating something like ectd+dns becomes more of an option. (and then slowly port over the features from Skydns). Having said this: helixdns could also be a starting point.

/Miek

Miek Gieben

crosbymichael commented 10 years ago

@miekg I kinda feel the same way. Skydock is not an issue, I can update it easy to insert records into a specific backend so that skydns is just a resolver doing one thing. Also if skydns did not include a http server it would be helpful if we were to embed it into docker and make it a default for service discovery....

miekg commented 10 years ago

[ Quoting notifications@github.com in "Re: [skydns] Cluster doesn't work o..." ]

@miekg I kinda feel the same way. Skydock is not an issue, I can update it easy to insert records into a specific backend so that skydns is just a resolver doing one thing. Also if skydns did not include a http server it would be helpful if we were to embed it into docker and make it a default for service discovery....

Ok, that is good to know.

/Miek

Miek Gieben

ghost commented 10 years ago

Hi again,

i tested it deeply, and i'm sure it is a issue with the network latency. Is there any easy way to fix it?

Thanks again for your early replies.

bketelsen commented 10 years ago

I can't dig through the source now, but there's a RAFT timeout setting somewhere you could raise. Set it to double your latency at least.

miekg commented 10 years ago

[ Quoting notifications@github.com in "Re: [skydns] Cluster doesn't work o..." ]

I think the client should update SkyDNS, and SkyDNS should have pluggable backends. Then the client registration is fixed and the DNS operates as usual.
No client code needs to change for a different data store.

We're totally hijacking this thread, but anyway. If started chipping away at SkyDNS. The current result (only after two hours of coding), is a skydns2 binary that

a) compiles b) forwards requests to 8.8.8.8 c) does not talk to etcd at all

There are 3 files in the repo, I expect to need only one more: etcd.go, after that some more tedious clean up work needs to be done.

Repo: https://github.com/miekg/skydns2

Two potential issues in relation with etcd (or other backends)

  1. (minor) for DNSSEC I need to know the previous and next name if something does not exist. Ectd provides some kind of sorted API, not sure if that is enough.
  2. What did we came up with for wildcard searches? Do the search are selves in etcd and faking wildcards that way? (Possibly with a cache?). If so there will be one other file added to this repo: wildcard.go

Grtz Miek

PS I might even get this done in time for Gophercon :) PPS What to do with logging and stats as skydns becomes less interesting?

crosbymichael commented 10 years ago

I don't know about etcd but redis supports wildcards on the keys. It also supports list with a random result back so round robin works great.

I think stats like number of queries and such can stay in skydns.

miekg commented 10 years ago

OK and OK. My current focus is etcd though. The rest will come later. On 22 Apr 2014 20:39, "Michael Crosby" notifications@github.com wrote:

I don't know about etcd but redis supports wildcards on the keys. It also supports list with a random result back so round robin works great.

I think stats like number of queries and such can stay in skydns.

— Reply to this email directly or view it on GitHubhttps://github.com/skynetservices/skydns/issues/89#issuecomment-41084636 .

ghost commented 10 years ago

@bketelsen I have solved the timeout problem by the high network latency, i will open a new pull request.

thanks to all and no problem with the hijacking in the threat ;) i will test skydns2 + etcd

miekg commented 10 years ago

[ Quoting notifications@github.com in "Re: [skydns] Cluster doesn't work o..." ]

In general I think we're all behind it. @miekg was working on an Etcd interface, but I'm sure he's busy with his new work. If you've got the time, go for it! It would allow SkyDNS to do its job and get us out of the big distributed debugging game.

Ok, small update on this. I pushed a new version of SkyDNS2 that only does DNS and only with etcd at this point. Lots of errors are not checked, but it is working. See some examples below. If few observations before I clean it up and proceed.

Small example: % ./skydns2 -dns=127.0.0.1:5354 -etcd=http://127.0.0.1:4001 -nameserver=8.8.4.4:53 2014/04/25 14:07:14 initializing server. DNS Addr: "127.0.0.1:5354", Forwarders: ["8.8.4.4:53"]

% curl -XPUT http://127.0.0.1:4001/v2/keys/local/skydns/blah/A -d value="10.0.1.1"

% dig @127.0.0.1 -p 5354 +noall +answer a blah.skydns.local blah.skydns.local. 60 IN A 10.0.1.1

% dig @127.0.0.1 -p 5354 +noall +answer a www.miek.nl www.miek.nl. 21599 IN CNAME a.miek.nl. a.miek.nl. 21599 IN A 176.58.119.54

Feature wise, I would call this helixdns+1. Code wise, there is not much that needs to be added. I would say, give or take two evenings of programming to get it up to par with the current SkyDNS.

/Miek

Miek Gieben

crosbymichael commented 10 years ago

SGTM

I help out this weekend also

Michael Crosby

On Apr 25, 2014, at 7:22 AM, Miek Gieben notifications@github.com wrote:

[ Quoting notifications@github.com in "Re: [skydns] Cluster doesn't work o..." ]

In general I think we're all behind it. @miekg was working on an Etcd interface, but I'm sure he's busy with his new work. If you've got the time, go for it! It would allow SkyDNS to do its job and get us out of the big distributed debugging game.

Ok, small update on this. I pushed a new version of SkyDNS2 that only does DNS and only with etcd at this point. Lots of errors are not checked, but it is working. See some examples below. If few observations before I clean it up and proceed.

  • if the DNS request is is blah.skydns.local A, the data is stored in etcd as: local/skydns/blah/A where the 'A' holds the IP address in text form.
  • I didn't port the whole uuid.host.service.region stuff, so basically you can query the whole contents of your etcd store, as long as you have a 'A', 'AAAA' or 'SRV' key in there. I think this makes sense...

Small example: % ./skydns2 -dns=127.0.0.1:5354 -etcd=http://127.0.0.1:4001 -nameserver=8.8.4.4:53 2014/04/25 14:07:14 initializing server. DNS Addr: "127.0.0.1:5354", Forwarders: ["8.8.4.4:53"]

% curl -XPUT http://127.0.0.1:4001/v2/keys/local/skydns/blah/A -d value="10.0.1.1"

% dig @127.0.0.1 -p 5354 +noall +answer a blah.skydns.local blah.skydns.local. 60 IN A 10.0.1.1

% dig @127.0.0.1 -p 5354 +noall +answer a www.miek.nl www.miek.nl. 21599 IN CNAME a.miek.nl. a.miek.nl. 21599 IN A 176.58.119.54

Feature wise, I would call this helixdns+1. Code wise, there is not much that needs to be added. I would say, give or take two evenings of programming to get it up to par with the current SkyDNS.

/Miek

Miek Gieben — Reply to this email directly or view it on GitHub.

miekg commented 10 years ago

[ Quoting notifications@github.com in "Re: [skydns] Cluster doesn't work o..." ]

SGTM

I help out this weekend also

Cool. Plan is to get this in reasonable shape and then move it to skynetservices.

grtz Miek

miekg commented 10 years ago

Ok, etcd.Get when using recursion will only recurse one level deep. It's not hard to recursive for ourselves, but this will kill performance (unless we cache (how long?)).

miekg commented 10 years ago

I lied, it does give back all the elements, recursively \0/

Right now, A/AAAA is working, I only need hack in SRV support and name synthesis when srv.Target is IP address.

crosbymichael commented 10 years ago

@miekg are you marshaling the entire msg.Service type of json then inserting it into etcd?

miekg commented 10 years ago

No, just the text version of the IP or a 4 elements of the Rdata of SRV records separated by spaces. I didn't see the need for json (but can be convinced otherwise).

I was cleaning msg.Service for the rewrite, nothing remained in it, so I discarded it.

crosbymichael commented 10 years ago

well why encode the SRV data separated by spaces?

bketelsen commented 10 years ago

i'm thinking we need better structure for the SRV records. or keys in a folder?

bketelsen commented 10 years ago

but that would make srv records a special case..

crosbymichael commented 10 years ago

Do we have an irc room to chat on? I'll start submitting PRs to help out on your fork. I can add back the stats stuff and not try to touch the dns work that you are doing right now.

bketelsen commented 10 years ago

skynet-dev

irc.freenode.net

miekg commented 10 years ago

Joined

bketelsen commented 10 years ago

and now I can't connect :(

miekg commented 10 years ago

[ Quoting notifications@github.com in "Re: [skydns] Cluster doesn't work o..." ]

skynet-dev

irc.freenode.net

Pushed latest code, some minor items remain (multiple SRV records, refactoring).
But in general I can claim: it is working! :-)

How are we going to test it? Download etcd and launch that during the tests??

/Miek

Miek Gieben