skynetservices / skydns

DNS service discovery for etcd
MIT License
2.2k stars 307 forks source link

Skydns takes 100% cpu after 5mn #87

Closed oliviervaussy closed 10 years ago

oliviervaussy commented 10 years ago

Hi,

I deployed a container containing only skydns and after about 5mn it takes 100% of 1 cpu:

Dockerfile => https://registry.hub.docker.com/u/witai/skydns2/ Docker version 1.1.2, build d84a070

I'm using fleet to deploy this container: fleetctl version 0.6.2

miekg commented 10 years ago

How many queries does it get? Is it logging too much? How many recursions per second does? What does strace say, i.e. is it doing something or just idle?

I have a test version running on dig @voordeur.atoom.net version.bind CH TXT which not getting many qps but has been working/running for weeks (apart from upgrading the binary from time to time)

oliviervaussy commented 10 years ago

I have the system on my own computer and it doesn't get a lot of query (2-3 per minutes), No logs from skydns. I'll send the strace file by email if that can help you.

Cheers

miekg commented 10 years ago

Which version of skydns are you using?

skydns -verbose should log queries. Is SkyDNS hogging the CPU? What does top say? sysdig might come in handy (although docker... so I'm not sure).

On Fri, Sep 12, 2014 at 7:24 PM, Oliv notifications@github.com wrote:

I have the system on my own computer and it doesn't get a lot of query (2-3 per minutes), No logs from skydns. I'll send the strace file by email if that can help you.

Cheers

— Reply to this email directly or view it on GitHub https://github.com/skynetservices/skydns/issues/87#issuecomment-55441819 .

oliviervaussy commented 10 years ago

We are using skydns2 (tag 2.0.0.f) as you can see here. Top says that skydns2 takes 1 entire cpu.

We are able to reproduce it on several plateforme so maybe if you try to run it with docker you may see the issue. In the meantime I'll have a look at sysdig.

miekg commented 10 years ago

Check. What is the quickest way to run docker with a dockerfile from that repository? (bit of a docker noob here)

On Fri, Sep 12, 2014 at 7:33 PM, Oliv notifications@github.com wrote:

We are using skydns2 (tag 2.0.0.f) as you can see here https://registry.hub.docker.com/u/witai/skydns2/dockerfile/. Top says that skydns2 takes 1 entire cpu.

We are able to reproduce it on several plateforme so maybe if you try to run it with docker you may see the issue. In the meantime I'll have a look at sysdig.

— Reply to this email directly or view it on GitHub https://github.com/skynetservices/skydns/issues/87#issuecomment-55442882 .

oliviervaussy commented 10 years ago

Something like this command should work :

docker run -e ETCD_MACHINES=localhost:4001 -e SKYDNS_NAMESERVERS='8.8.8.8:53,8.8.4.4:53' -e SKYDNS_DOMAIN=domain.test --rm -p 53:53/udp witai/skydns2:2.0.0f -addr=0.0.0.0:53 -discover
miekg commented 10 years ago

Ah, thanks. I'm on a mobile net now and killed download when it wanted another 20MB. Can only start to take look tomorrow.

If you don't use '-discover' do you also see this problem?

On Fri, Sep 12, 2014 at 7:41 PM, Oliv notifications@github.com wrote:

Something like this command should work :

docker run -e ETCD_MACHINES=localhost:4001 -e SKYDNS_NAMESERVERS='8.8.8.8:53,8.8.4.4:53' -e SKYDNS_DOMAIN=domain.test --rm -p 53:53/udp witai/skydns2:2.0.0f -addr=0.0.0.0:53 -discover

— Reply to this email directly or view it on GitHub https://github.com/skynetservices/skydns/issues/87#issuecomment-55443880 .

oliviervaussy commented 10 years ago

Without the -discover everything seems ok (even after 1h of run)

miekg commented 10 years ago

Cool, something goes haywire with that stuff then, cool that narrows it down. Something (either skydns or etcd) is doing a shit-ton useless updates I guess.

On Fri, Sep 12, 2014 at 9:03 PM, Oliv notifications@github.com wrote:

Without the -discover everything seems ok (even after 1h of run)

— Reply to this email directly or view it on GitHub https://github.com/skynetservices/skydns/issues/87#issuecomment-55453704 .

miekg commented 10 years ago

'-discover' sets a watch on /_etcd/machines/ in etc. Is something updating that a lot? And that section good prolly deal with some logging as well.

On Fri, Sep 12, 2014 at 9:39 PM, Miek Gieben miek@miek.nl wrote:

Cool, something goes haywire with that stuff then, cool that narrows it down. Something (either skydns or etcd) is doing a shit-ton useless updates I guess.

On Fri, Sep 12, 2014 at 9:03 PM, Oliv notifications@github.com wrote:

Without the -discover everything seems ok (even after 1h of run)

— Reply to this email directly or view it on GitHub https://github.com/skynetservices/skydns/issues/87#issuecomment-55453704 .

oliviervaussy commented 10 years ago

Doesn't look like the value change. When watching the node I don't see it change.

Here is the command I executed:

curl -L http://127.0.0.1:4001/v2/keys/_etcd/machiness?wait=true
miekg commented 10 years ago

hmm. Well I added some logging to see how often this gets triggered on the skydns' side. I'll push a new release. Maybe I should use some dampening there so this doesn't get executed that often.

On Fri, Sep 12, 2014 at 10:01 PM, Oliv notifications@github.com wrote:

Doesn't look like the value change. When watching the node I don't see it change.

Here is the command I executed:

curl -L http://127.0.0.1:4001/v2/keys/_etcd/machiness?wait=true

— Reply to this email directly or view it on GitHub https://github.com/skynetservices/skydns/issues/87#issuecomment-55460170 .

miekg commented 10 years ago

Yes, it takes 100% cpu on startup. Repeatable without docker as well:

ETCD_MACHINES=http://127.0.0.1:4001
SKYDNS_NAMESERVERS='8.8.8.8:53,8.8.4.4:53'
SKYDNS_DOMAIN=domain.test
./skydns -addr=0.0.0.0:1053 -discover

Barfs:

[skydns] Sep 14 07:26:58.808 INFO      | ectd machine cluster update
[skydns] Sep 14 07:26:58.808 INFO      | ectd machine cluster update
[skydns] Sep 14 07:26:58.808 INFO      | ectd machine cluster update
[skydns] Sep 14 07:26:58.808 INFO      | ectd machine cluster update
[skydns] Sep 14 07:26:58.808 INFO      | ectd machine cluster update
[skydns] Sep 14 07:26:58.808 INFO      | ectd machine cluster update

because there is no etcd.

miekg commented 10 years ago

commits 89e27d0 and 7025dba should fix this.

oliviervaussy commented 10 years ago

v2.0.0.h is working well.

Thanks