raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
10.86k stars 4.88k forks source link

USB audio capture shows spurious samples in raspios 64-bit distributions. #5544

Open gregory-whaley opened 11 months ago

gregory-whaley commented 11 months ago

Describe the bug

Using a variety of low-cost USB audio microphone capture adapters, I have found cases where capture of audio signals show data corruption in the stream sporadically every few minutes. This appears to happen only with stereo capture devices (in mono or stereo) and not with mono (single channel) capture hardware. It only happens with raspios 64-bit distributions, and not with 32-bit distributions. It does not seem to depend on which specific kernel, only if that distribution was compiled for 32 bit or 64 bit.

There are no error messages reported in any log or by the application software. There are no overrun or underrun (XRUN) reports. The data corruption appears as a small grouping of samples from one to about 25 with random sample values. Acoustically, this sounds like a "click" or "pop" upon replay. There is no unusual USB behavior correlated with the spurious samples as monitored by wireshark.

An example WAV file is attached and a screenshot showing a typical example spurious transient. This capture had no input signal so is mostly expected noise floor along with the transient errors.

Buster OS 64bit - Digital Life card at 48kHz stereo.wav.zip

Buster OS 64bit - Digital Life card at 48kHz stereo

It doesn't correlate with the user application, it happens with audacity, arecord, and gnuradio. It doesn't seem to be related to pulseaudio. It happens regardless if pulseaudio is running or not. I always capture from the ALSA device hw:2,0 which is supposed to access the stream directly from ALSA rather than getting the stream through pulseaudio. All the devices use snd-usb-audio driver.

Steps to reproduce the behaviour

From a fresh install of raspios taken from the offical distribution archives: https://downloads.raspberrypi.org/, I installed audacity:

sudo apt update
sudo apt install audacity

No system updates installed. Plug in a USB audio capture device. Use no microphone or input signal, i.e. record just the noise floor. Using audacity, select the ALSA input hw:2,0 which access the data stream unaffected by pulseaudio. Record up to 10 minutes of stream. Transient errors show clearly in the waveform display.

Note there is a known problem with some USB audio devices where they do not perform well in USB-high-speed mode (480Mbps) and the workaround is to force the device into USB-full-speed mode (12Mbps). I have verified that all these devices use full-speed mode (12Mbps) natively.

Device (s)

Raspberry Pi 3 Mod. B

System

Four different OS images were tested: 2023-05-03-raspios-bullseye-armhf (32 bit) 2023-05-03-raspios-bullseye-arm64 (64 bit) 2021-03-04-raspios-buster-arm64 (64 bit) 2021-01-11-raspios-buster-armhf (32 bit)

pi@raspberrypi:~ $ vcgencmd version Mar 17 2023 10:52:42 Copyright (c) 2012 Broadcom version 82f3750a65fadae9a38077e3c2e217ad158c8d54 (clean) (release) (start) pi@raspberrypi:~ $

Here is the version for the 2023-05-03 bullseye 64 bit OS:

pi@raspberrypi:~ $ uname -a Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux pi@raspberrypi:~ $

Here is raspinfo: raspinfo.txt

Logs

No response

Additional context

The problem seems to happen more frequently when the CPU is busy with other tasks such as window management, or as in my case digital signal processing in gnuradio. So clicking on various other windows in the desktop environment seems to trigger the faults.

My guess is that either the snd-usb-audio driver or something in the ALSA code base is not properly handling pointers in the 64 bit environment so that when the OS is performing some background memory management, the pointers to the data stream are sometimes corrupted. Again, this does not happen in any 32-bit version of the OS.

P33M commented 11 months ago

The transients are small, a handful of samples each. They appear to be audio data, but with a byte shift for the duration of a glitch - easily visible in a hexdump.

00216e40: b1ff 5500 b0ff 6100 b3ff 5500 abff 5d00  ..U...a...U...].
00216e50: b4ff 5400 acff 5e00 afff 5600 aeff 5a00  ..T...^...V...Z.
00216e60: adff 5700 b5ff 5300 b3ff 5d00 b5ff 5c00  ..W...S...]...\.
00216e70: b4ff 6000 b1ff 5a00 b0ff 5a00 01b1 015f  ..`...Z...Z...._
00216e80: ffa6 fe51 01b1 0055 00ab fe58 ffa6 fe58  ...Q...U...X...X
00216e90: afff 5a00 adff 5700 abff 5600 a8ff 4e00  ..Z...W...V...N.
00216ea0: aaff 5700 a8ff 4e00 a9ff 5600 b1ff 5700  ..W...N...V...W.
00216eb0: a4ff 4f00 aeff 5000 abff 5500 a9ff 5300  ..O...P...U...S.

Odd that the corruption is a) smaller than a cacheline and b) "fixes" itself. In the dwc_otg FIQ, FS Isochronous traffic uses coherent DMA bounce buffers to do transaction reassembly. Is the coherent behaviour different in AARCH64 vs 32 environments?

P33M commented 11 months ago

After you have recorded a wav file with glitches in, what's the output of dmesg? Clear the buffer beforehand with sudo dmesg -C

P33M commented 11 months ago

I think I have a reproduction here - if I check that the number of bytes transferred in an isochronous packet is an integer multiple of sample size, I occasionally get glitches coincident with an odd number of bytes in a transfer completion. Isochronous endpoints used for audio data shouldn't do this.

gregory-whaley commented 11 months ago

Hi,

Thanks for looking into this bug.

Attached are two files of dmesg output, first immediately after booting the system, and again after one transient glitch occurred about 6 minutes later.

I don’t see anything unusual in dmesg when the glitch happens.

On Jul 28, 2023, at 4:39 AM, P33M @.***> wrote:

After you have recorded a wav file with glitches in, what's the output of dmesg?

— Reply to this email directly, view it on GitHub https://github.com/raspberrypi/linux/issues/5544#issuecomment-1655385119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGLWAG36YRSIL4HLTKAOZLXSOCFXANCNFSM6AAAAAA2O2E5EM. You are receiving this because you authored the thread.

gregory-whaley commented 11 months ago

So in this particular audio capture, there was an overflow error which happened and shows in the dmesg output. This can happen when the SD card writing gets bogged down, however it is separate from the transient glitch problem. The buffer overflow event is reported by Audacity, for example. The glitch transient is not reported.

Thanks, Gregory

On Jul 28, 2023, at 2:28 PM, Gregory Whaley @.***> wrote:

Hi,

Thanks for looking into this bug.

Attached are two files of dmesg output, first immediately after booting the system, and again after one transient glitch occurred about 6 minutes later.

I don’t see anything unusual in dmesg when the glitch happens.

  • Gregory

On Jul 28, 2023, at 4:39 AM, P33M @. @.>> wrote:

After you have recorded a wav file with glitches in, what's the output of dmesg?

— Reply to this email directly, view it on GitHub https://github.com/raspberrypi/linux/issues/5544#issuecomment-1655385119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGLWAG36YRSIL4HLTKAOZLXSOCFXANCNFSM6AAAAAA2O2E5EM. You are receiving this because you authored the thread.

P33M commented 11 months ago

Ah, 64-bit kernels are incompatible with ARM's FIQ handlers, so the FIQ code is demoted to a regular IRQ handler. The only workaround is to specify arm_64bit=0 in /boot/config.txt and use a 32-bit distribution.

pelwell commented 11 months ago

I'll just leave this (3889ba70102ed8) here...

gregory-whaley commented 11 months ago

So I am satisfied with the work-around strategy, but I wonder if a long term FIQ fix should be implemented in future 64-bit kernels? Should this bug be closed, or left open?

Thanks, Gregory

On Jul 31, 2023, at 8:41 AM, P33M @.***> wrote:

Ah, 64-bit kernels are incompatible with ARM's FIQ handlers, so the FIQ code is demoted to a regular IRQ handler. The only workaround is to specify arm_64bit=0 in /boot/config.txt and use a 32-bit distribution.

— Reply to this email directly, view it on GitHub https://github.com/raspberrypi/linux/issues/5544#issuecomment-1658400463, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGLWAD7OLB7PVWGCW3XHQTXS6YY3ANCNFSM6AAAAAA2O2E5EM. You are receiving this because you authored the thread.

P33M commented 11 months ago

The linked commit adds arm64 architectural support for FIQ handlers but there's quite a bit of plumbing required to get dwc_otg to use it. It's low priority, so won't get done quickly, but no need to close the issue.

micdini commented 1 month ago

Hello, in the forum I opened this topic. I know it probably isn't the same issue but there are several element here that I think relate to my issue (auto-healing of the noise, not linked to a specifi application, randomess but linked to cpu load/irq handling in some ways).

On a CM4 / 32 bit arch. I have similar glitches on playback, but I'm using usb on xhci controller. After many test, I found that the playback of two h264 video (1920x1080@25 Hz, level 5, about 30 Mb/s) trigger the issue ofter (several time in 3-4 minutes).

My guess at the moment is some misconfiguration of DMA between video pipeline (to/from h264 block) and xhci isochronous transfers.

Any hint or configuration to try in order to isolate the issue?

gregory-whaley commented 1 month ago

My issue seemed limited to USB 2.0 audio transfers using 64-bit OS on RP 3B only. I was never able to reproduce the problem with 32-bit Raspios, so your problem seems unrelated. sorry I can’t help any more than that.

On May 22, 2024, at 5:54 PM, Michele Dini @.***> wrote:

Hello, in the forum I opened this topic https://forums.raspberrypi.com/viewtopic.php?t=370288. I know it probably isn't the same issue but there are several element here that I think relate to my issue (auto-healing of the noise, not linked to a specifi application, randomess but linked to cpu load/irq handling in some ways).

On a CM4 / 32 bit arch. I have similar glitches on playback, but I'm using usb on xhci controller. After many test, I found that the playback of two h264 video @.*** Hz, level 5, about 30 Mb/s) trigger the issue ofter (several time in 3-4 minutes).

My guess at the moment is some misconfiguration of DMA between video pipeline (to/from h264 block) and xhci isochronous transfers.

Any hint or configuration to try in order to isolate the issue?

— Reply to this email directly, view it on GitHub https://github.com/raspberrypi/linux/issues/5544#issuecomment-2125911375, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGLWACLGYDASNVHF2KTYLLZDUO23AVCNFSM6AAAAAA2O2E5EOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRVHEYTCMZXGU. You are receiving this because you authored the thread.