lkrg-org / lkrg

Linux Kernel Runtime Guard
https://lkrg.org
Other
415 stars 72 forks source link

Microphone stops working intermittently in WebRTC when p_lkrg module is loaded. #157

Open vi opened 2 years ago

vi commented 2 years ago

Steps to reproduce:

  1. Load the module version 1f0bf0653462220639c591dd42f76a5700ec75f4 to Debian kernel 5.10.0-10-amd64.
  2. Set lkrg.profile_validate to at least 1.
  3. On Xorg, use Chromium for a WebRTC call, with HDA Intel PCH / Realtek ALC3204 audio card. For example, open two tabs of https://meet.jit.si/LkrgAudioIssues, unmuting microphone in one of them.
  4. You should hear what you are speaking in the microphone.
  5. Wait for some time or provoke it with while true; do sysctl lkrg.trigger=1; sleep 0.05 || break; done.

Expected:

Mic stays on.

Actual:

Outgoing audio disappears. Reopening the tab and rejoining the call works makes microphone work again temporarily.

Recording audio from command line using arecord -f cd seems to be unaffected.

vi commented 2 years ago

Sometimes audio disruptions are heard prior to full muting of the microphone.

lkrg.kint_validate=1 alone triggers the bug, lkrg.pcfi_validate=1, lkrg.pint_validate=1, lkrg.smap_validate=1, lkrg.smep_validate =1, lkrg.smep_validate=1 and lkrg.umh_validate=1 does not seem to affect the audio.

vi commented 2 years ago

Other (external) audio card seems to be unaffected, although audio quality gets lower by lkrg.trigger=1 flood.

solardiz commented 2 years ago

@vi Thank you for reporting. Does anything new appear in dmesg when these problems occur?

vi commented 2 years ago

Does anything new appear in dmesg when these problems occur?

No.

vi commented 2 years ago

Is there something like "optimisation fuel" for LKRG? To set a number and make LKRG decrement it when doing each check, aborting and reporting the next check when it is 0; allowing to bisect specific place where problem occurs.

solardiz commented 2 years ago

@vi No, there isn't a debugging feature like you request, and I doubt that it would have helped in this case.

Adam-pi3 commented 2 years ago

@vi @solardiz I think it's not a bug per se, but when LKRG does perform validation, it disables IRQ and forces each core to perform metadata validation + main core to validate entire system integrity. I assume that this is relatively heavy operation for your machine and temporary and quick IRQ disable/enable operation might generate the problems which you can see.

vi commented 2 years ago

Can I somehow test this temporarily IRQ disabling outside of LKRG? Is there some user-exposed knob in sysfs to also trigger that? Or a simple linux kernel module that does just that.

solardiz commented 2 years ago

@Adam-pi3 I think a bigger issue could be that LKRG enters p_text_section_lock() state for a fairly long time. Can this cause any timing-critical code to be waiting on a lock? For example, something in the kernel triggering a JUMP_LABEL update? Or maybe something in LKRG itself that acquired some other lock first, thus causing other kernel functionality to wait on that other lock?

If that's the problem and we don't find a simpler solution to it, then we could want to consider (optionally?) splitting LKRG's system integrity checking into smaller chunks, like IIRC some related/derived project (Aurora's maybe? not sure I recall) wrote they did.

@vi I'm not aware of an existing knob or module you could use to reproduce just that aspect easily.

vi commented 2 years ago

Is a Linux kernel module that just periodically disable and re-enable IRQs an easy thing to implement?

solardiz commented 2 years ago

@vi Yes, but let's avoid such distractions.