Open zrusilla opened 12 years ago
Okay, when I said
$heap->{agent}->main_loop;
I meant
$kernel->delay(agent_check => $heap->{ping_delay}),
$heap->{agent}->agent_check_and_process(0);
because if you call main_loop then control never goes back to POE, of course. D'oh. Carry on.
Hello Elizabeth,
I've been working on this module for the past few days, needing it to write a program for $job.
I confirm that the FD handler in most POE loops begins to spin furiously when snmpd is stopped. But the situation gets back to normal as soon as snmpd is restarted. However, I've also found that the AgentX support seems very broken in recent versions of Net-SNMP. I'll make more tests and post here the results of my findings.
Sébastien Aperghis-Tramoni
Close the world, txEn eht nepO.
Hi Sebastien,
Thanks for following up. Yes, restarting snmpd is the workaround I suggested at Hebex too, but some people don't like that suggestion.
NetSNMP::agent is very frustrating in that it gives the user no indication that a socket has disconnected and reconnected. The assumption is that the user does not need this information, which is not the case here. POE::Select::Loop is very frustrating in that it doesn't allow you to intervene if select returns -1. Between the two of them, it's a mess.
Cheers,
Liz
On Nov 14, 2012, at 11:53 PM, Sébastien Aperghis-Tramoni wrote:
Hello Elizabeth,
I've been working on this module for the past few days, needing it to write a program for $job.
I confirm that the FD handler in most POE loops begins to spin furiously when snmpd is stopped. But the situation gets back to normal as soon as snmpd is restarted. However, I've also found that the AgentX support seems very broken in recent versions of Net-SNMP. I'll make more tests and post here the results of my findings.
Sébastien Aperghis-Tramoni
Close the world, txEn eht nepO. — Reply to this email directly or view it on GitHub.
I had an idea last night, that I'll try today: maybe one of the problem is that I give to POE's kernel the file descriptor of the socket, instead of a copy (dup) of the file descriptor.
I read again Marc Lehmann's rant about all the other event framework in AnyEvent documentation, and that's something he mentions. And indeed, the only POE loops which does not go crazy when the AgentX socket is closed are EV and AnyEvent, so somehow, Marc did something right.
Sébastien Aperghis-Tramoni
Close the world, txEn eht nepO.
Hello Maddingue,
I also tried loading EV and I noticed the same thing: while it didnt solve the problem, it didn't go insane.
I was appalled when I read POE::Select::Loop::loop_do_timeslice. There is no way to specify a handler for a select() error? Really?? I couldn't believe it.
Part of the problem, too, is that by the time ev_agent_check is invoked, it's already too late: you're off to the races, spinning furiously.
I'm a fan of AnyEvent now. I wrote a project using it and Coro and it works like a charm without too much extra code clutter. Eric recently ported a program from Poe to AE and is pleased with the results, too.
Keep me posted (pun intended),
Liz
On Nov 15, 2012, at 8:50 AM, Sébastien Aperghis-Tramoni wrote:
I had an idea last night, that I'll try today: maybe one of the problem is that I give to POE's kernel the file descriptor of the socket, instead of a copy (dup) of the file descriptor.
I read again Marc Lehmann's rant about all the other event framework in AnyEvent documentation, and that's something he mentions. And indeed, the only POE loops which does not go crazy when the AgentX socket is closed are EV and AnyEvent, so somehow, Marc did something right.
Sébastien Aperghis-Tramoni
Close the world, txEn eht nepO. — Reply to this email directly or view it on GitHub.
Zrusilla wrote:
Hello Maddingue,
Hello Elizabeth,
I also tried loading EV and I noticed the same thing: while it didnt solve the problem, it didn't go insane.
Hmm, what do you mean "it didn't solve the problem"? In my tests, once you restart snmpd, the subagent always reconnect to the socket.
Note that, along with POE::Loop::AnyEvent and POE::Loop::EV, POE::XS::Loop::EPoll prevents this spinlock problem. But.. it does not seem compatible with all versions of NetSNMP::agent..
I was appalled when I read POE::Select::Loop::loop_do_timeslice. There is no way to specify a handler for a select() error? Really?? I couldn't believe it.
Part of the problem, too, is that by the time ev_agent_check is invoked, it's already too late: you're off to the races, spinning furiously.
I know. At this level, the only thing we can do is to reduce the delay (default 10 sec) before calling agent_check so it can reconnect.
Also, I just tested and it appears that not dup-ing the file descriptor (i.e., changing line 183 from C< open my $fh, "+<&=", $fd; > to C< open my $fh, "+<&", $fd; > make things worse: even once the subagent reconnected the socket, the POE kernel spins because of the faulty file descriptor.
I'm a fan of AnyEvent now. I wrote a project using it and Coro and it works like a charm without too much extra code clutter. Eric recently ported a program from Poe to AE and is pleased with the results, too.
The problem I have with AnyEvent is that it looks made for writing programs, but not modules.
Sébastien Aperghis-Tramoni
Close the world, txEn eht nepO.
By "didn't solve the problem" I meant that while it didn't spin furiously, it did not solve the problem of supplying the correct FDs to the program, which is a separate problem. Perhaps the solution will be a combination of the two.
I am not sure what you mean by AnyEvent looks made for writing programs, not modules. Please elaborate.
Please keep me posted on what you find.
Zrusilla a écrit :
By "didn't solve the problem" I meant that while it didn't spin furiously, it did not solve the problem of supplying the correct FDs to the program, which is a separate problem. Perhaps the solution will be a combination of the two.
Do you mean that once snmpd has restarted, and the sub-agent has reconnected, the requests aren't passed over to the sub-agent? In my tests, once reconnected, everything works fine.
I am not sure what you mean by AnyEvent looks made for writing programs, not modules. Please elaborate.
In the sense that I don't see classes, modules and objects, but instead many of these condvars, which are not easy to understand and don't look obvious to modularize.
Sébastien Aperghis-Tramoni
Close the world, txEn eht nepO.
Just pushed in a new repository a first shot at porting this POE component to AnyEvent » https://github.com/maddingue/AnyEvent-NetSNMP-agent
Absolutely not tested as I don't even have NetSNMP::agent installed here.
Sébastien Aperghis-Tramoni
Close the world, txEn eht nepO.
Ironically, trying to make the code work with AnyEvent bring new kinds of bugs » https://github.com/maddingue/AnyEvent-NetSNMP-agent/issues/1
This bug concerns two distinct problems in our usage of this module.
ev_agent_check registers the AgentX sockets with POE. If snmpd is restarted, the socket goes away, the select() call in POE::Loop::Select returns -1 with an error of 'Bad file descriptor' and begins to spin furiously, occupying 100% of CPU. By the time ev_agent_check is called, it is already too late. I have not found a way to intervene and recover from the error.
If I substitute POE::Loop::EV, the spinning problem does not occur but POE still has the old, closed socket, not the new one connected by NetSNMP::agent.
I have presently worked around the problem by overriding ev_agent_check in a subclass:
and by ensuring this function is the handler to agent_check in POE::Component::NetSNMP::agent
There does not appear to be an elegant way to know that NetSNMP::agent has reconnected, get those FDs, and register them with POE.
Please advise.