Open solardiz opened 1 year ago
Perhaps we should instead use cancel_work_sync
or flush_work
on our potentially queued work functions.
Either way, to be race-free we should ensure we would never re-queue the work at this point - that is, we had already shut down whatever mechanism would have queued the work and no pending instance of that mechanism is still running - a condition I'm not sure we currently fully enforce.
FWIW, in a certain unreleased LKRG extension that queues more work, we use this pattern (more recently written code):
void lkrg_queue_XXX(void)
{
queue_work(system_unbound_wq, &work);
}
void lkrg_deregister_XXX(void)
{
deregister_XXX(XXX);
flush_work(&work);
cancel_work_sync(&work); /* Redundant unless the work re-queues */
}
Perhaps we should fix the existing (older) code similarly.
We just got a CI failure on CentOS Stream 9 because of this: https://github.com/lkrg-org/lkrg/actions/runs/5012121814/jobs/8990025299?pr=270
/__w/lkrg/lkrg/src/modules/integrity_timer/p_integrity_timer.c: In function 'p_offload_cache_delete':
./include/linux/workqueue.h:641:17: error: call to '__warn_flushing_systemwide_wq' declared with attribute warning: Please avoid flushing system-wide workqueues. [-Werror=attribute-warning]
641 | __warn_flushing_systemwide_wq(); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/__w/lkrg/lkrg/src/modules/integrity_timer/p_integrity_timer.c:54:4: note: in expansion of macro 'flush_workqueue'
54 | flush_workqueue(system_unbound_wq);
| ^~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[2]: *** [scripts/Makefile.build:297: /__w/lkrg/lkrg/src/modules/integrity_timer/p_integrity_timer.o] Error 1
I don't know where the -Werror
came from, and if it's a distro thing then whether they'll manage to keep it that way for long or perhaps revert soon.
Anyway, we should fix this issue.
As seen in our
mkosi-mainline
builds lately: