ytti / oxidized

Oxidized is a network device configuration backup tool. It's a RANCID replacement!
Apache License 2.0
2.69k stars 909 forks source link

Defunct ssh processes #450

Closed roedie closed 6 years ago

roedie commented 8 years ago

When Oxidized (0.14.3) is running for some time I get defunct ssh processes. Like: oxidized 29377 0.0 0.0 0 0 ? Z May28 0:00 [ssh] <defunct> I'm not sure where they come from and I wonder how I can debug this. Even more, I don't think we should get defunct processes at all.

ytti commented 8 years ago

Huh. We don't ever fork SSH, we have pure ruby SSH library. Maybe one of your hooks is firing it off?

roedie commented 8 years ago

Oxidized does fork ssh when using a proxy host. Or am I mistaken about that? lib/oxidized/input/ssh.rb:27: proxy = Net::SSH::Proxy::Command.new("ssh #{proxy_host} -W %h:%p")

I do not have any hooks using ssh. Just one hook using sendmail when a node fails.

ytti commented 8 years ago

Oh right, that's true. I think we should have ensuresomewhere guaranteeing it gets killed. Something for @ElvinEfendi I hope.

ElvinEfendi commented 8 years ago

@roedie have you tried running Oxidized without a proxy and checking if the same error happens again? It would be great if you could provide us the steps to regenerate the issue. Do you see anything odd in the log? Try running in debug mode(debug: true) and monitoring the logs.

admlko commented 7 years ago

@ytti @ElvinEfendi I have the same problem, it seems to be related to ProxyCommand feature, as it happens only if I use it. I have tried to check debug log for anomalities but not sure what is normal and what is not. This caught my eye, is it normal:

lib/oxidized/input/cli.rb: Running post_login command: nil, block: #<Proc:0x007fb654044b08@/var/lib/gems/2.3.0/gems/oxidized-0.15.0/lib/oxidized/model/ios.rb:63> at HOSTNAME

If I understand correctly, log should be divided per node, so the first logline of a node starts with: n @ HOSTNAME with expect: nil and then the log of a node.

What I see, is multiple nods log output under one. I guess this could be a threading issue?

After finishing updating all the nodes with and without ProxyCommand, I get these lines in the log every second:

D, [2016-07-13T09:56:14.561536 #10276] DEBUG -- : lib/oxidized/worker.rb: Jobs 0, Want: 1
D, [2016-07-13T09:56:15.561896 #10276] DEBUG -- : lib/oxidized/worker.rb: Jobs 0, Want: 1
D, [2016-07-13T09:56:16.562318 #10276] DEBUG -- : lib/oxidized/worker.rb: Jobs 0, Want: 1
D, [2016-07-13T09:56:17.562704 #10276] DEBUG -- : lib/oxidized/worker.rb: Jobs 0, Want: 1
D, [2016-07-13T09:56:18.563040 #10276] DEBUG -- : lib/oxidized/worker.rb: Jobs 0, Want: 1
danilopopeye commented 6 years ago

closing as is to old and have no update. feel free to reopen is the problem still persists

kefiras commented 6 years ago

Hello, I can confirm that when the SSH proxy fails it goes into zombie mode. Can someone reopen and take a look at this, please ? It creates hundreds of zombies and makes our monitoring very noisy about this. @danilopopeye can You reopen this issue, please ? Thanks a lot in advance

xoxax commented 6 years ago

Is any additional information needed to fix this issue? I'm seeing the same problem when using an SSH proxy to poll devices as well. relevant lines from my config: input: default: ssh debug: true ssh: secure: false source: default: csv csv: file: "/home/oxidized/.config/oxidized/router.db" delimiter: !ruby/regexp /:/ map: name: 0 model: 1 vars_map: ssh_proxy: 2

laf commented 6 years ago

This is upstream from us, a PR is in place but not merged yet:

https://github.com/net-ssh/net-ssh/pull/556 https://github.com/net-ssh/net-ssh/issues/557

I've subscribed to the PR so once I see it merged I'll update this issue.

laf commented 6 years ago

Ok so that the fix in net-ssh upstream has been merged net-ssh/net-ssh#580

It will however need a release building for it from them for you to upgrade using gems. You can build net-ssh from source though in the meantime.

As this isn't something we can directly fix right now I'm going to close this issue.

jameskirsop commented 5 years ago

To provide an update on this issue.

There are now gems for net-ssh that incorporate this change. I've modified oxidized to allow it to function with net-ssh 5.1.0 and I'm still seeing defunct processes lying around after several hours of the oxidized process running.

I was going to create a PR for my changes, but they don't actually seem to fix this problem at this point in time.

July 2019 Further Update Now running net-ssh 5.2.0 and still seeing defunct processes. I'll see if I can work out a cause and file a new issue given https://github.com/net-ssh/net-ssh/issues/526 and https://github.com/net-ssh/net-ssh/issues/557 were apparently fixed.

jameskirsop commented 2 years ago

5 years after this issue was logged, we're still seeing defunct SSH processes appear as children of Oxidized. :(

I've updated the upstream issue with some further pleas for help.

joschi99 commented 1 year ago

I can confirm that also with net-ssh 7.1.0 and oxidized 0.29.1 the problem is still there. Any solution available?