Open sechmann opened 1 year ago
That's pretty weird all right. I'm hard pressed to understand how it's happening, but I don't know much about the underlying linux usb systems.
If you're up for experimenting, launcher runs a couple of commands to determine what users to launch desktop menus as. I wonder if they're doing stuff that gets in the way. Launcher does something like:
loginctl show-user $uid --value --property=Sessions
loginctl show-session $session --value --property=Type
Agreed on the weirdness :sweat_smile:. I'm up for experimenting.
The commands you're mentioning give the following output (and running them does not cause any sound issues, if that was what you were wondering).
% for u in $(cat /etc/passwd | cut -d ':' -f 3); do
for s in $(loginctl show-user $u --value --property=Sessions); do
echo "user ($u) session ($s): $(loginctl show-session "$s" --value --property=Type)"
done
done
Failed to get user: User ID 0 is not logged in or lingering
Failed to get user: User ID 1 is not logged in or lingering
Failed to get user: User ID 2 is not logged in or lingering
Failed to get user: User ID 3 is not logged in or lingering
Failed to get user: User ID 4 is not logged in or lingering
Failed to get user: User ID 5 is not logged in or lingering
Failed to get user: User ID 6 is not logged in or lingering
Failed to get user: User ID 7 is not logged in or lingering
Failed to get user: User ID 8 is not logged in or lingering
Failed to get user: User ID 9 is not logged in or lingering
Failed to get user: User ID 10 is not logged in or lingering
Failed to get user: User ID 13 is not logged in or lingering
Failed to get user: User ID 33 is not logged in or lingering
Failed to get user: User ID 34 is not logged in or lingering
Failed to get user: User ID 38 is not logged in or lingering
Failed to get user: User ID 39 is not logged in or lingering
Failed to get user: User ID 42 is not logged in or lingering
Failed to get user: User ID 65534 is not logged in or lingering
Failed to get user: User ID 998 is not logged in or lingering
Failed to get user: User ID 997 is not logged in or lingering
Failed to get user: User ID 100 is not logged in or lingering
Failed to get user: User ID 101 is not logged in or lingering
Failed to get user: User ID 996 is not logged in or lingering
Failed to get user: User ID 102 is not logged in or lingering
Failed to get user: User ID 103 is not logged in or lingering
Failed to get user: User ID 995 is not logged in or lingering
Failed to get user: User ID 104 is not logged in or lingering
Failed to get user: User ID 105 is not logged in or lingering
Failed to get user: User ID 106 is not logged in or lingering
Failed to get user: User ID 107 is not logged in or lingering
Failed to get user: User ID 108 is not logged in or lingering
Failed to get user: User ID 109 is not logged in or lingering
Failed to get user: User ID 110 is not logged in or lingering
Failed to get user: User ID 111 is not logged in or lingering
Failed to get user: User ID 112 is not logged in or lingering
Failed to get user: User ID 113 is not logged in or lingering
Failed to get user: User ID 114 is not logged in or lingering
Failed to get user: User ID 115 is not logged in or lingering
Failed to get user: User ID 116 is not logged in or lingering
Failed to get user: User ID 117 is not logged in or lingering
Failed to get user: User ID 118 is not logged in or lingering
Failed to get user: User ID 119 is not logged in or lingering
Failed to get user: User ID 120 is not logged in or lingering
Failed to get user: User ID 121 is not logged in or lingering
Failed to get user: User ID 122 is not logged in or lingering
Failed to get user: User ID 123 is not logged in or lingering
user (1000) session (2): wayland
I tried enabling debug logging to see if i could correlate the sound issues with any specific event from the launcher logs.
After some minutes of tailing the log there seems to always be a crackling noise preceeding the PublishLogs
method (by the amount of time the log says it took
).
(don't mind the time difference in the logs - dmesg get's out of sync when laptop is suspended)
i hear a crack, dmesg:
[Sat Aug 26 02:21:41 2023] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
[Sat Aug 26 02:21:41 2023] xhci_hcd 0000:00:14.0: Looking for event-dma 0000000123d9b280 trb-start 0000000123d9b290 trb-end 0000000123d9b290 seg-start 0000000123d9b000 seg-end 0000000123d9bff0
[Sat Aug 26 02:22:41 2023] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
[Sat Aug 26 02:22:41 2023] xhci_hcd 0000:00:14.0: Looking for event-dma 0000000123d9ba60 trb-start 0000000123d9ba70 trb-end 0000000123d9ba70 seg-start 0000000123d9b000 seg-end 0000000123d9bff0
[Sat Aug 26 02:22:41 2023] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
[Sat Aug 26 02:22:41 2023] xhci_hcd 0000:00:14.0: Looking for event-dma 0000000123d9c740 trb-start 0000000123d9c750 trb-end 0000000123d9c750 seg-start 0000000123d9c000 seg-end 0000000123d9cff0
about 1 sec later i see this in the launcher log:
Aug 25 09:40:58 vegar-t14 launcher[873523]: {"caller":"teelogger.go:25","err":null,"errcode":"","logType":"string","log_count":1,"message":"","method":"PublishLogs","reauth":false,"session_pid":873523,"severity":"debug","took":"686.816355ms","ts":"2023-08-25T07:40:58.786047513Z","uuid":"-snip just in case-"}
I hear another crack, dmesg:
[Sat Aug 26 02:21:41 2023] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
[Sat Aug 26 02:21:41 2023] xhci_hcd 0000:00:14.0: Looking for event-dma 0000000123d9b280 trb-start 0000000123d9b290 trb-end 0000000123d9b290 seg-start 0000000123d9b000 seg-end 0000000123d9bff0
[Sat Aug 26 02:22:41 2023] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
[Sat Aug 26 02:22:41 2023] xhci_hcd 0000:00:14.0: Looking for event-dma 0000000123d9ba60 trb-start 0000000123d9ba70 trb-end 0000000123d9ba70 seg-start 0000000123d9b000 seg-end 0000000123d9bff0
[Sat Aug 26 02:22:41 2023] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
[Sat Aug 26 02:22:41 2023] xhci_hcd 0000:00:14.0: Looking for event-dma 0000000123d9c740 trb-start 0000000123d9c750 trb-end 0000000123d9c750 seg-start 0000000123d9c000 seg-end 0000000123d9cff0
[Sat Aug 26 02:23:41 2023] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
[Sat Aug 26 02:23:41 2023] xhci_hcd 0000:00:14.0: Looking for event-dma 0000000123d9c100 trb-start 0000000123d9c110 trb-end 0000000123d9c110 seg-start 0000000123d9c000 seg-end 0000000123d9cff0
3 seconds later this pops up in launcher log:
Aug 25 09:42:01 vegar-t14 launcher[873523]: {"caller":"teelogger.go:25","err":null,"errcode":"","logType":"string","log_count":1,"message":"","method":"PublishLogs","reauth":false,"session_pid":873523,"severity":"debug","took":"3.172595693s","ts":"2023-08-25T07:42:01.278641556Z","uuid":"-snip just in case-"}
The issues can also (rarely) happen with no logs being printed from the kolide launcher. However they stop completely when i stop the launcher so I think it's at least correlated.
Hrm. Mostly I'm guessing based on what feels like it might be right.
Under the hood, launcher is running those loginctl
commands every few seconds. Can you wrap your for loop with something that runs it ever 2 seconds?
Log sending is interesting. There are a couple steps in there:
It's a big of a hack, and I would not suggest running this way, but thinking about how to test things... You adjust the interval launcher tries to send logs at. This is the logging_interval
option. I don't remember what the default is, but it should accept anything between 5 seconds and 10 minutes.
Running the loginctl
in a loop does not produce any issues here.
I also tried running it with GNU parallel (spawning 100 processes) and 1000 iterations per loop. In other words i ran the loop 100 000 times. System load at 55 and all 12 cores at 100% for 15 minutes - no sound issues at all.
Tried a few different logging_interval
settings. Setting it to 10m made the sound cracks less frequent, and they always happened when logging occurred.
Setting it to 10s made the sound cracks more frequent, but they did not always happen at the same time as logging.
Also, quite often i got no logs for several minutes (during that time cracking sounds still occurred). The RequestConfig
method however seems to always produce a large(r) crackling noise.
Wow, that sounds like a pretty solid test of the loginctl
theory. Thank you
I'm at a loss, since the code path in question isn't near any kind of dbus or linux things. It's got socket activity, network activity, CPU, and disk. But linux should handle those... I suspect we'll need to try to reproduce it and poke some things.
You mentioned Ubuntu 23.04, I assume it's current on the packages? Which kernel package?
Were you running launcher prior to 23.04?
Were you running launcher prior to 23.04?
Yes
I am currently experiencing the same sound cracks on a Dell laptop with Ubuntu 22.04, using the same USB hub and sound card. Both systems are current on the packages.
Kernels in use: 23.04:
% uname -r
6.2.0-31-generic
22.04:
% uname -r
6.0.0-1012-oem
I'm experiencing this problem too, on a Lenovo Carbon X1 running a current version of Arch. The launcher version is slightly newer:
launcher - version 1.1.2
branch: HEAD
revision: ebbf392d7867bc8427840b8d5de9ccf629c8647e
build date: 2023-10-04
build user: (runner)
go version: go1.21.1
Like the original reporter, there's only a single user logged in, and it's a wayland session. I'm also running pipewire, which may be different from the current state of the audio stack on Ubuntu. I don't see any 'dmesg' output when the crackling occurs.
I was guessing that the crackling might be caused by buffer underruns on the audio device, so I tried increasing the buffer size using pw-metadata -n settings $dev-id clock.force-quantum $increasingly-large-values
, but this didn't have any effect on the crackling.
Just a quick update that I haven't been able to reproduce this yet -- neither the crackling nor the corresponding dmesg logs. I'm on a HP laptop running Ubuntu 23.04 (kernel is 6.2.0-34-generic), using a logitech H340 USB headset, since we'd previously seen this issue or similar reported with that model of headset plugged directly into the laptop. Going to try to pick up a USB hub + sound card next to see if that lets me reproduce the issue.
The code relating to log sending is pretty simple. It starts around https://github.com/kolide/launcher/blob/59ad0a639927f3cb838188b9d92bc381f0836566/pkg/osquery/extension.go#L647
Start a bboltdb transaction, ingest logs into an array, send the array over the network to our servers. I don't think there's anything unusual in those. I expect bboltdb is a bunch of file io, and network is network.
I experience the same crackling on a Lenovo ThinkPad X1 Carbon Gen 9 running Debian with kernel 6.5 and Sway window manager.
The external USB audio interface is a Motu M4.
$ /usr/local/kolide-k2/bin/launcher -version
{"caller":"main.go:30","msg":"Launcher starting up","revision":"d30592bbcec072fd6988d4e2346b1765f705084b","severity":"info","ts":"2023-10-19T07:11:56.876186415Z","version":"0.11.22"}
{"caller":"main.go:33","msg":"Launcher starting up","revision":"ebbf392d7867bc8427840b8d5de9ccf629c8647e","severity":"info","ts":"2023-10-19T07:11:56.889465862Z","version":"1.1.2"}
launcher - version 1.1.2
branch: HEAD
revision: ebbf392d7867bc8427840b8d5de9ccf629c8647e
build date: 2023-10-04
build user: (runner)
go version: go1.21.1
$ uname -r
6.5.0-1-amd64
$ pipewire --version
pipewire
Compiled with libpipewire 0.3.82
Linked with libpipewire 0.3.82
I don't see any messages in dmesg
, however, the crackling happens when CPU usage of the kolide process spikes - which happens irregularly, but can be observed when running top
.
I assume there is some OSquery that interrupts other system processes, even though they have higher rt priority (such as audio processes).
I assume there is some OSquery that interrupts other system processes, even though they have higher rt priority (such as audio processes).
I'm not sure what would do that.
Would just like to say that I'm still experiencing this, and it's quite annoying. Any chance to get this fixed?
Hi @sechmann, I'm sorry for the delay on the follow-up for this. Could I get the exact brand/model for your USB dock and external USB sound card? I haven't been able to reproduce this issue yet, but I get that it's super annoying and would like to get it resolved!
@RebeccaMahany It's this one: https://www.aten.com/global/en/products/usb-solutions/docks-and-switches/us434/
I don't really know how these issues relate to each other, but i've experienced this on 2 different (Lenovo) laptops now.
After i install the Kolide launcher i immediately start experiencing issues with an external USB sound card (connected through a USB dock).
dmesg
spits out the following lines each time the audio is crackling.sudo systemctl stop launcher.kolide-k2.service
solves the issue immediately.Plugging the USB sound card directly to the laptop also solves the issue, however there seems to be something weird with USB docks + Kolide launcher.