Open neoben opened 6 years ago
Shouldn't it be already dying at that point? What sequence of actions lead your router to stay alive?
The loop inside RouterThread::driver() function breaks if __stopflag is set to true:
#if !BSD_NETISRSCHED
// check to see if driver is stopped
if (_stop_flag && _master->verify_stop(this))
break;
#endif
If I'm not wrong, there is just a function setting this flag which is _requeststop() in master.hh:
inline void
Master::request_stop()
{
for (RouterThread **t = _threads; t != _threads + _nthreads; ++t)
(*t)->_stop_flag = true;
// ensure that at least one thread is awake to handle the stop event
wake_somebody();
}
Without calling _requeststop from _Master::killrouter(Router *router) , when I remove the click kernel module (using rmmod) I receive this kind of error:
[ 386.600000] click: current router threads refuse to die!
[ 386.610000] click: Following threads still active, expect a crash:
[ 386.620000] click: router thread pid 1916
So if __stopflag is false, RouterThread::driver() is an infinite loop and I suppose this is why _kill_routerthreads() from linuxmodule/sched.cc fails after 5 seconds and several attempts:
static int
kill_router_threads()
{
delete placeholder_router;
if (click_router)
click_router->set_runcount(Router::STOP_RUNCOUNT);
// wait up to 5 seconds for routers to exit
unsigned long out_jiffies = jiffies + 5 * HZ;
int num_threads;
do {
MDEBUG("click_sched: waiting for threads to die");
SOFT_SPIN_LOCK(&click_thread_lock);
num_threads = click_thread_tasks->size();
SPIN_UNLOCK(&click_thread_lock);
if (num_threads > 0)
schedule();
} while (num_threads > 0 && jiffies < out_jiffies);
if (num_threads > 0) {
printk(KERN_ALERT "click: current router threads refuse to die!\n");
return -1;
} else
return 0;
}
I think this issue is related just to kernel space and it works fine user space. Let me know if this helps or if I'm missing something. Thanks!
I'm less familiar with kernel click. However request_stop should be called from adjust_runcount/set_runcount. When you unload the module, click_cleanup_sched is called, which calls kill_router_threads. kill_router_threads set the runcount to STOP_RUNCOUNT using set_runcount, that in turn should set the _stop_flag to true, no?
Yes, but _setruncount with _STOPRUNCOUNT is called just if _clickrouter is defined and it can be undefined (actually this is my case) so there is no function calling _setruncount:
if (click_router)
click_router->set_runcount(Router::STOP_RUNCOUNT);
I acknowledge the problem, I had it myself. However setting always stop_flag to true seems dangerous. If run_count is still not 0 because some references are still kept, the router should not finish to die.
Maybe you can detail your sequence of event? How did click_router got undefined without run_count being 0? That's the real issue, I think.
I will investigate more about _clickrouter getting undefined.
Anyway I discovered that the change I made was part of the trunk code before and it was removed at some point: https://github.com/kohler/click/commit/85f1315aaecac47eee0a208fbfb2996f0dc7b564#diff-fd6f8a65a460329f21fa8327a96ce179
The click router threads refuse to die because no one is setting the _stop_flag. There is an infinite loop in RouterThread::driver() function. Call request_stop killing router in master.cc to fix the problem.