lkrg-org / lkrg

Linux Kernel Runtime Guard
https://lkrg.org
Other
404 stars 72 forks source link

flush_workqueue(system_unbound_wq) triggers a compile-time warning #232

Open solardiz opened 1 year ago

solardiz commented 1 year ago

As seen in our mkosi-mainline builds lately:

In file included from ./include/linux/srcu.h:21,
                 from ./include/linux/notifier.h:16,
                 from ./arch/x86/include/asm/uprobes.h:13,
                 from ./include/linux/uprobes.h:49,
                 from ./include/linux/mm_types.h:16,
                 from ./include/linux/buildid.h:5,
                 from ./include/linux/module.h:14,
                 from /root/src/src/modules/integrity_timer/../../p_lkrg_main.h:26,
                 from /root/src/src/modules/integrity_timer/p_integrity_timer.c:18:
/root/src/src/modules/integrity_timer/p_integrity_timer.c: In function 'p_offload_cache_delete':
./include/linux/workqueue.h:640:17: warning: call to '__warn_flushing_systemwide_wq' declared with attribute warning: Please avoid flushing system-wide workqueues. [-Wattribute-warning]
  640 |                 __warn_flushing_systemwide_wq();                        \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/src/src/modules/integrity_timer/p_integrity_timer.c:54:4: note: in expansion of macro 'flush_workqueue'
   54 |    flush_workqueue(system_unbound_wq);
      |    ^~~~~~~~~~~~~~~
In file included from ./include/linux/srcu.h:21,
                 from ./include/linux/notifier.h:16,
                 from ./arch/x86/include/asm/uprobes.h:13,
                 from ./include/linux/uprobes.h:49,
                 from ./include/linux/mm_types.h:16,
                 from ./include/linux/buildid.h:5,
                 from ./include/linux/module.h:14,
                 from /root/src/src/modules/exploit_detection/../../p_lkrg_main.h:26,
                 from /root/src/src/modules/exploit_detection/p_exploit_detection.c:18:
In function 'p_ed_wq_valid_cache_delete',
    inlined from 'p_exploit_detection_exit' at /root/src/src/modules/exploit_detection/p_exploit_detection.c:2058:4:
./include/linux/workqueue.h:640:17: warning: call to '__warn_flushing_systemwide_wq' declared with attribute warning: Please avoid flushing system-wide workqueues. [-Wattribute-warning]
  640 |                 __warn_flushing_systemwide_wq();                        \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/src/src/modules/exploit_detection/p_exploit_detection.c:382:4: note: in expansion of macro 'flush_workqueue'
  382 |    flush_workqueue(system_unbound_wq);
      |    ^~~~~~~~~~~~~~~
solardiz commented 1 year ago

Perhaps we should instead use cancel_work_sync or flush_work on our potentially queued work functions.

Either way, to be race-free we should ensure we would never re-queue the work at this point - that is, we had already shut down whatever mechanism would have queued the work and no pending instance of that mechanism is still running - a condition I'm not sure we currently fully enforce.

solardiz commented 1 year ago

FWIW, in a certain unreleased LKRG extension that queues more work, we use this pattern (more recently written code):

void lkrg_queue_XXX(void)
{
    queue_work(system_unbound_wq, &work);
}

void lkrg_deregister_XXX(void)
{
    deregister_XXX(XXX);
    flush_work(&work);
    cancel_work_sync(&work); /* Redundant unless the work re-queues */
}

Perhaps we should fix the existing (older) code similarly.

solardiz commented 1 year ago

We just got a CI failure on CentOS Stream 9 because of this: https://github.com/lkrg-org/lkrg/actions/runs/5012121814/jobs/8990025299?pr=270

 /__w/lkrg/lkrg/src/modules/integrity_timer/p_integrity_timer.c: In function 'p_offload_cache_delete':
./include/linux/workqueue.h:641:17: error: call to '__warn_flushing_systemwide_wq' declared with attribute warning: Please avoid flushing system-wide workqueues. [-Werror=attribute-warning]
  641 |                 __warn_flushing_systemwide_wq();                        \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/__w/lkrg/lkrg/src/modules/integrity_timer/p_integrity_timer.c:54:4: note: in expansion of macro 'flush_workqueue'
   54 |    flush_workqueue(system_unbound_wq);
      |    ^~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[2]: *** [scripts/Makefile.build:297: /__w/lkrg/lkrg/src/modules/integrity_timer/p_integrity_timer.o] Error 1

I don't know where the -Werror came from, and if it's a distro thing then whether they'll manage to keep it that way for long or perhaps revert soon.

Anyway, we should fix this issue.