Closed bcronje closed 7 years ago
Ok some minor progress on this, it seems to be a regression bug. @ahenning pointed out to me that he was unable to replicate the issue on an older version of Click he is running. After some testing it appears that commit f40363b introduced the bug, as I am unable to replicate it before this commit.
Incidentally it also appears commit f40363b has some performance regression, at least when running the above config:
Looking at this…
Thank you Eddie. Are you able to reproduce on your side?
Yes, I am, and I have what I believe is a fix. Looking at potential performance regression.
On an AWS test machine, the commit before f40363b got ~82000 pps on this config, whereas with d8b39ad we get ~120000 pps. I think this is fixed. :( Thanks for reporting the problem.
Ugh, I think 1d05ca31574ae3b8a21c28c8ea6ec98704f0d553 addressed this problem, I forget why it didn't get merged
@kohler thank you so much for looking into this. I tested and your fix works perfectly. Race condition is gone and performance back on par, great stuff!
I've picked up an issue where the following task invariant does not always hold true:
The issue can be reliably replicated when configuring userlevel Click with
./configure --disable-linuxmodule --enable-user-multithread --enable-schedule-debugging
and running the following configuration:mt.click
Running the above configuration with:
click mt.click -j3 -p 13000
invariably ends up with no packets being pushed to q, even though there are packets in nq1 and nq2, while uq1 task and uq2 task indefinitely ends up with state _is_scheduled but !on_scheduled_list() && !on_pending_list():Task states as per read handlers:
If I
write uq1.scheduled true
andwrite uq2.scheduled true
then packet processing resumes again.At this point I think what happens is that when task A on thread X fires, and is not fast reschuled in the task hook, task A is removed from the scheduled list in RouterThread::run_tasks(), while thread Y is calling task A reschedule() but before thread X has completed the removal from the scheduled list. Which causes thread Y to not add the task to the scheduled list, after which thread X completes the removal of the task from the scheduled list:
From task::reschedule():
Do you guys agree with my conclusion? Or do you think the issues is somewhere else? Any idea where/how best to address the issue?