Closed roedie closed 6 years ago
Huh. We don't ever fork SSH, we have pure ruby SSH library. Maybe one of your hooks is firing it off?
Oxidized does fork ssh when using a proxy host. Or am I mistaken about that? lib/oxidized/input/ssh.rb:27: proxy = Net::SSH::Proxy::Command.new("ssh #{proxy_host} -W %h:%p")
I do not have any hooks using ssh. Just one hook using sendmail when a node fails.
Oh right, that's true. I think we should have ensure
somewhere guaranteeing it gets killed. Something for @ElvinEfendi I hope.
@roedie have you tried running Oxidized without a proxy and checking if the same error happens again?
It would be great if you could provide us the steps to regenerate the issue. Do you see anything odd in the log? Try running in debug mode(debug: true
) and monitoring the logs.
@ytti @ElvinEfendi I have the same problem, it seems to be related to ProxyCommand feature, as it happens only if I use it. I have tried to check debug log for anomalities but not sure what is normal and what is not. This caught my eye, is it normal:
lib/oxidized/input/cli.rb: Running post_login command: nil, block: #<Proc:0x007fb654044b08@/var/lib/gems/2.3.0/gems/oxidized-0.15.0/lib/oxidized/model/ios.rb:63> at HOSTNAME
If I understand correctly, log should be divided per node, so the first logline of a node starts with:
n @ HOSTNAME with expect: nil
and then the log of a node.
What I see, is multiple nods log output under one. I guess this could be a threading issue?
After finishing updating all the nodes with and without ProxyCommand, I get these lines in the log every second:
D, [2016-07-13T09:56:14.561536 #10276] DEBUG -- : lib/oxidized/worker.rb: Jobs 0, Want: 1
D, [2016-07-13T09:56:15.561896 #10276] DEBUG -- : lib/oxidized/worker.rb: Jobs 0, Want: 1
D, [2016-07-13T09:56:16.562318 #10276] DEBUG -- : lib/oxidized/worker.rb: Jobs 0, Want: 1
D, [2016-07-13T09:56:17.562704 #10276] DEBUG -- : lib/oxidized/worker.rb: Jobs 0, Want: 1
D, [2016-07-13T09:56:18.563040 #10276] DEBUG -- : lib/oxidized/worker.rb: Jobs 0, Want: 1
closing as is to old and have no update. feel free to reopen is the problem still persists
Hello, I can confirm that when the SSH proxy fails it goes into zombie mode. Can someone reopen and take a look at this, please ? It creates hundreds of zombies and makes our monitoring very noisy about this. @danilopopeye can You reopen this issue, please ? Thanks a lot in advance
Is any additional information needed to fix this issue? I'm seeing the same problem when using an SSH proxy to poll devices as well. relevant lines from my config: input: default: ssh debug: true ssh: secure: false source: default: csv csv: file: "/home/oxidized/.config/oxidized/router.db" delimiter: !ruby/regexp /:/ map: name: 0 model: 1 vars_map: ssh_proxy: 2
This is upstream from us, a PR is in place but not merged yet:
https://github.com/net-ssh/net-ssh/pull/556 https://github.com/net-ssh/net-ssh/issues/557
I've subscribed to the PR so once I see it merged I'll update this issue.
Ok so that the fix in net-ssh upstream has been merged net-ssh/net-ssh#580
It will however need a release building for it from them for you to upgrade using gems. You can build net-ssh from source though in the meantime.
As this isn't something we can directly fix right now I'm going to close this issue.
To provide an update on this issue.
There are now gems for net-ssh that incorporate this change. I've modified oxidized to allow it to function with net-ssh 5.1.0 and I'm still seeing defunct processes lying around after several hours of the oxidized process running.
I was going to create a PR for my changes, but they don't actually seem to fix this problem at this point in time.
July 2019 Further Update Now running net-ssh 5.2.0 and still seeing defunct processes. I'll see if I can work out a cause and file a new issue given https://github.com/net-ssh/net-ssh/issues/526 and https://github.com/net-ssh/net-ssh/issues/557 were apparently fixed.
5 years after this issue was logged, we're still seeing defunct
SSH processes appear as children of Oxidized. :(
I've updated the upstream issue with some further pleas for help.
I can confirm that also with net-ssh 7.1.0 and oxidized 0.29.1 the problem is still there. Any solution available?
When Oxidized (0.14.3) is running for some time I get defunct ssh processes. Like:
oxidized 29377 0.0 0.0 0 0 ? Z May28 0:00 [ssh] <defunct>
I'm not sure where they come from and I wonder how I can debug this. Even more, I don't think we should get defunct processes at all.