Closed DBMandrake closed 3 years ago
After further reading I think I may be affected by the problem described in this issue:
https://github.com/raspberrypi/linux/issues/2134
I will attempt to build a custom version of Volumio with the very latest kernel (rpi-update cannot be used) to see if it fixes the problem as even the latest dev build of volumio is only 4.9.41 - shown to be bad in issue 2134.
I've now tested a 4.9.51-v7+ kernel hoping that the timer fix for isochronous transfer in 43 described above would be the solution but unfortunately it doesn't fix this issue.
The severity of the problem seems to be reduced by about half - with a 36 or 41 kernel I get on about 6 dropouts per minute which audibly last probably 1/10th of a second, with the 51 kernel I am seeing maybe 3 dropouts per minute and they don't "sound" as bad, but they are definitely still there and the audio is not bit perfect or glitch free, and can easily be heard on a sinewave test signal played using aplay.
Is there any further testing or debugging I can do to get to the bottom of this issue or at least prove whether it is a hardware limitation or still potentially fixable in software ?
More testing and some interesting conclusions. First of all this problem is not fixed by newer kernels and doesn't seem to be related to the timer fix in issue #2134, unfortunately. I can reproduce the same problem with 4.9.36, 4.9.41 and 4.9.51 with very little difference contrary to what I said above.
I found that when I tested on a fresh install of Raspbian I was not experiencing audio glitches using aplay. at either 44.1Khz 16 bit or 96Khz 16 bit.
I then tried to narrow down what was different between Raspbian and Volumio. It turns out that the main Volumio service - which is a node js server for the web interface, when running leads to glitches/dropouts in the audio. When I stop that service the glitches almost entirely stop although I can still reproduce a very occasional glitch with it stopped, maybe once every 2-3 minutes, instead of every few seconds when it is running.
Nodejs is using very little CPU time (under 10%) when the web interface is idle yet still causes pretty bad sound glitches. If I renice nodejs to a low priority or aplay to a high priority it makes no difference at all, which suggests its not aplay being starved for CPU time. (aplay never reports any underruns anyway)
I experimented with the snd-usb-audio nrpacks module option - no change at all. I also experimented with every buffer related alsa configuration setting - no improvement. Finally I tried dwc_otg.speed=1 to disable USB2.0 support and the symptoms are gone - absolutely no glitches or dropouts in about an hour of testing on test signals and music, despite nodejs running as before.
So something the nodejs service is doing is causing issues with USB2.0 transmission - perhaps certain syscalls that it is calling are disabling interrupts for a long time, or the syscalls are just taking too long to return leading to the kernel missing a time critical USB transmission.
Perhaps a realtime kernel would help here. I would add that this sound interface is USB2.0 (but also supports fallback to USB1.1) so I don't think this relates to the fiq fixes from the last few years ?
Is there anything I can do to improve the USB 2.0 performance to avoid dropouts in the audio or am I stuck using USB1.1 as a workaround ?
Hello.
From an examination of your lsusb listing, the audio device is a high-speed device. The descriptors for the isochronous OUT endpoints (i.e. audio streams) also specify a bInterval of 1 which means a packet transfer every microframe - this is a particularly strenuous requirement for the Pi as it means the maximum hardware interrupt latency tolerable is <125uS.
The FIQ code attempts to improve matters by performing batches of transfers, but it can only perform as many transfers as the submitted URB requests - at the boundary where URBs are returned/the next one queued, control is passed to the IRQ-driven driver so there's a window of vulnerability where a latency spike could cause frame slippage.
dwc_otg.speed=1 will increase the hardIRQ latency tolerance to 1ms as that's the frame interval at full-speed. It's no surprise that the glitches disappear when clobbering the bus speed.
The issue described above should cause at most momentary audio dropouts (typically less than 1ms) but this is a function of how the USB device behaves when presented with an underrun condition (i.e. missing data). Can you capture a typical output stream via line-out -> recording device (e.g. line-in on a PC or similar) and upload the wav somewhere?
What's puzzling is the apparent difference between running a "heavy" userspace application (which should have negligible effect on hardIRQ latency) and plain aplay.
@P33M one thing @DBMandrake mentioned to me was strace showed node js doing a number of cacheflush operations. It does make use of JIT and so is likely triggering instruction/data cache flushes in the kernel. I wonder whether there is any disabling of interrupts when that occurs? (or otherwise harming of interrupt latency)
From an examination of your lsusb listing, the audio device is a high-speed device. The descriptors for the isochronous OUT endpoints (i.e. audio streams) also specify a bInterval of 1 which means a packet transfer every microframe - this is a particularly strenuous requirement for the Pi as it means the maximum hardware interrupt latency tolerable is <125uS.
That makes sense. This interface is designed for home recording studio use and as such minimising round trip latency from record to playback is a key factor in it's design, possibly at the expense of not being as "easy" to drive by the host OS.
The problem felt like a latency to me issue as well - the glitches don't seem particularly worse if I play 96Khz 24bit (which I believe is sent as 32bit to the card) than if I send 44.1Khz 16 bit, suggesting it wasn't a throughput issue.
The issue described above should cause at most momentary audio dropouts (typically less than 1ms) but this is a function of how the USB device behaves when presented with an underrun condition (i.e. missing data). Can you capture a typical output stream via line-out -> recording device (e.g. line-in on a PC or similar) and upload the wav somewhere?
Yes I can can make some recordings for you, I should be able to do this tomorrow. The severity varies from a small "click" that you can only hear on a simple and uncluttered test signal like a sine wave (but not easily in music) all the way to a loud "pop" that is easily heard during music.
Even if the click was subtle though, any glitch or dropout does defeat the purpose of trying to set up bit perfect playback from a high quality DAC.
What's puzzling is the apparent difference between running a "heavy" userspace application (which should have negligible effect on hardIRQ latency) and plain aplay.
Yes it is very puzzling - when I identified nodejs as the application that was placing the problematic "load" on the system (stopping the service eliminated 99% of the symptoms) I spent a few hours trying to figure out why.
I found it didn't seem to be just raw CPU use that was causing the issue, as adjusting relative priority of the nodejs service (which is a parent thread and 3-4 worker threads) and aplay made absolutely no difference, and I was still getting glitches even when nodejs was mostly "idle". (less than 5% CPU use)
Also other high cpu loads that I tried putting on the system were not causing trouble. Memory use also wasn't an issue as the service is using about 200MB of ram and the system is not using swap and has plenty of free ram.
I concluded that if it was causing this problem with such low cpu use even at low scheduling priority it must be making system call(s) that were somehow harming interrupt response time so I decided to strace the parent nodejs process and as popcornmix says I noticed there were a lot of cacheflush calls being made, especially soon after the service was started. (When audio glitches were even worse still)
The cacheflush calls might be a red herring, but it was the only thing that jumped out at me and seemed like something worth following up.
It's possible the problem isn't even in the USB driver at all but somewhere else in the kernel where some user space accessible system call(s) are doing something that is hurting interrupt response times or maybe even causing driver interrupts to be missed altogether ?
As promised here are some recordings of the problem made using audacity via line input, they are in 192Kbit 32 bit float wav format, uncompressed. This is a 20Hz sinewave - low enough in frequency that it won't be heard on most speakers/headphones but the resulting clicks will be.
The first one was made after volumio had been booted for a while and was largely "idle". cpu use of nodejs at this point is under 10%:
https://drive.google.com/open?id=0B5qpN5dK9MKjdWc4Y2F2SmlDNTA
Clicking is more subtle and I count roughly 6 glitches over 30 seconds. I measured a couple in Audacity and they were about 0.4ms long, but because the amplitude jumps back to near zero it makes it quite audible.
This second recording was started while the nodejs service was stopped, and then the service was started at about 5 seconds into the recording, with symptoms much more severe as a lot more work is being done by the service including setting up child threads:
https://drive.google.com/open?id=0B5qpN5dK9MKjOURST3VfWUdEVHM
It takes a few seconds but it starts crackling badly between 16 and 22 seconds. I measured one dropout at about 17.7 seconds that lasts a full 8ms, which suggests something is very wrong.
Here is the original test file I'm playing with aplay:
https://drive.google.com/open?id=0B5qpN5dK9MKjbkhhdTFfdHNzbDg
Most of the dropouts appear to be the sample simply dropping to near zero for the duration then returning to where it should be, but I did see at least one dropout in the second file where you can see from the sinewave that time sync has been lost altogether and the sample has been "delayed", at about 17.79 seconds.
Just to try to rule out cpu hogging I repeated the test again with the nodejs service niced to 19 and it was no better, if anything it was worse: (although there is a lot of random variation from test to test)
https://drive.google.com/open?id=0B5qpN5dK9MKjTXh4Nk5YWGxCMkE
In this last one I see a 3ms dropout where the time sync seems to have slipped at 16.38s although most of the glitches are about 0.5ms in length. Here is a screen grab from that area:
Hopefully these files are useful to help at least classify the nature of the problem - if you need any more testing done please let me know.
Hi there, we are happy to find this thread. In short: We have the same problem with the ARM architecture. We tried:
All the hints we found in lots of forums didn't help, but It WORKS on:
So it's clear that there must be a timing / time slice issue with the isochronous data streams on ARM devices.
@musicwonder Can you confirm which USB controller the DAC is connected to on the tinker board?
A lsusb -t
will tell you.
So it's clear that there must be a timing / time slice issue with the isochronous data streams on ARM devices.
It's not actually all arm devices. While this problem is being looked into I've been running Volumio on a dual core Cubox-i (imx6 arm architecture) and I have absolutely zero problems with glitches or dropouts with this USB interface even when the box is under heavy load. Unfortunately the Cubox-i build of Volumio is older than the Pi build and not kept as up to date by the developers (new versions are always released for the Pi first) so I would prefer to run the Pi version if the problem can be solved satisfactorily.
I've also tried this Behringer interface on a Vero4k (Amlogic S905X arm chipset) and there are no glitches there either, although to be fair this is much faster hardware than a Pi so is not a fair comparison, and there are no Volumio builds available for it. (I tested with both Kodi and aplay)
@P33M: It happens at each of the 4 USB Ports.
I've been toying with the idea of adding low-key profiling to the dwc_otg driver - it's possible to get reliable indication that you're missing interrupts (FIQ->IRQ latency being too big, for example).
Unfortunately time is tight at the moment so it will be a while before I can look at this.
Is there a chance that the issue is gone with a rt-kernel? If so, I would it give a try.
Unfortunately time is tight at the moment so it will be a while before I can look at this.
No problem - meanwhile I am using it on my Cubox, when you get a chance to look at it I'll be around to do any testing required.
No I have made a test with an orange pi pc2 with armbian 4.13. Same issue but less often. One dropout per 1-4 minutes. Why only arm devices?
rt-kernel doesn't help. Tested.
Update: With orange pi pc2 and a new install of armbian 4.13 the sound is about one minute clear. After that a first drop out comes. Then I have many glitches and dropouts and buffer underrun while using brutefir (2 In/4 Out with an USB c-media soundcard). Reboot doesn't help. You cannot get the same clear sound how it is after a new installation. Could there be an issue with SD Card read/write?
Has there been any further progress on this issue?
There's nothing further I can do at the moment but I'm able to test any proposed fixes.
I've been banging my head quite a lot on this lately. In my case disabling nodejs made things a bit better, but dropouts were still there. Those are my results so far:
Any pointers will be appreciated, I can test proposed fixes and recompile the kernel if needed.
Did you try dwc_otg.speed=1 ? For me this almost eliminates the issue. Of course this is only a workaround, and restricts the USB controller to USB 1.1 mode and therefore limits all USB devices (including the onboard network adaptor) to 12Mbps.
However if your only USB devices are the sound card and the onboard Ethernet controller and you are not trying to use it for anything other than Volumio then this seems to be fast enough - after all audio doesn't require very high bitrates.
As P33M said earlier this reduces the required interrupt response time from 125uS to 1mS, making the interrupt load much easier to deal with.
Might be a usable workaround for you in the meantime. I've sidestepped the issue by running Volumio on a Cubox-i for now but that introduces other issues like the fact that the Volumio builds for that device are very out of date so I'd like to come back to the Pi when I can.
Was able to reduce significantly (almost none) dropouts by setting MPD cpu affinity to a core and setting its realtime priority (with RT kernel) to 55. However this made the whole system unstable.
I see the same glitches testing with aplay from the command line and also airplay (shairport) when mpd isn't even the active player process, so the issue is a lot more deep seated than that and seems to be down at the interrupt handler/scheduler level in the kernel.
Did you try dwc_otg.speed=1 ? For me this almost eliminates the issue. Of course this is only a workaround, and restricts the USB controller to USB 1.1 mode and therefore limits all USB devices (including the onboard network adaptor) to 12Mbps.
I would like to avoid this, as it's not really a solution, since it will mean max 24/96 and it will make db indexing very slow.
@P33M one thing @DBMandrake mentioned to me was strace showed node js doing a number of cacheflush operations. It does make use of JIT and so is likely triggering instruction/data cache flushes in the kernel. I wonder whether there is any disabling of interrupts when that occurs? (or otherwise harming of interrupt latency)
The ARM SVC handler explicitly enables interrupts before calling whatever ARM-specific handler is needed: http://elixir.free-electrons.com/linux/latest/source/arch/arm/kernel/entry-common.S#L172
So the cache maintenance operation should be done without hardirqs disabled.
The coherent_user_range() function invalidates Icache and cleans Dcache out to the inner shareable boundary which I imagine takes many cycles, but there shouldn't be anything like holding off interrupts for dozens of microseconds in that operation...
@popcornmix @P33M
are there in your opinion some kernel confs or userspace tweaks that can be applied in this particular scenario (Hi RES USB Music playback) that you think can mitigate\solve this issue?
Hi all
I would like to inform that I have exactly the same problem with differents setups connected to asynchronous USB DAC XMOS WAVEIO, whatever the music format
I have posted some logs here http://logs.volumio.org/volumio/mD6JT8I.html
Good luck for debugging it really is annoying.
BR
Although i'm not hundred percent sure i think i might be experiencing the same problem with a Creative X-Fi Surround 5.1 pro USB DAC. Playing back something in Kodi (post 7.0.3 LibreELEC builds with a kernel newer than 4.4.13 and also on the latest raspbian) that has an audio track with more than 4 channels, eg 5.1 whenever there is silence i start to hear a crackling noise. I was not able to reproduce the issue using VLC on the same system. Reverting back to an older kernel (not sure about the exact version but 4.4.13 seems to be better) the issue remains but the noise becomes so quiet that i am barely able to hear it. lsusb -v and samples can be found here: https://1drv.ms/f/s!Ask92-8JR3iXhAp25gE0TisGy5Qw The first sample is recorded on LibreELEC 8.2.2, the second on LibreELEC 7.0.3. In the second sample the clicks are barely audible but they are worse IRL.
I am fighting with this issue for a long time already.
Had it with a LFS system with initd and found the issue not present using the same hardware with raspbian. The main difference seemed that raspbian used systemd and cgroups which I didn't have and could not test.
I switched over to a buildroot system with systemd and cgroups but the problem remains. I looked for any special configuration of cgroups in raspbian but could not find anything.
Kernel-configs an both mentioned systems were always the default targets of which I understand raspbian is using as well. - Are raspbian kernels patched in any way?
@DBMandrake your audio glitches sound exactly what I encounter on my PC using a basic USB audio DAC plugged into USB 2 port - random clusters of glitched audio (usually about 30 seconds of clear audio then it starts glitching). So this issue is not specific to raspberry pi or ARM and the comment by @musicwonder that it worked on one x86 platform therefore it's ARM-specific isn't right.
Pre-emptive kernels or ones running a faster kernel timer do not help. I even have a BIOS option for Isochronous support and flipping that doesn't change anything. Using usbmon I can see there is a very slightly different message passed only at the exact time of the crackle, but I was not able to figure out what the messages meant, other than it LOOKS like a status flag showing a negative number rather than zero (in crackle-free periods).
What helped on my PC was plugging my DAC into the USB3 bus, rather than USB2. Unfortunately I haven't diagnosed it further, once I figured out a workaround I gave up. Unfortunately on the Pi you only have 1 root port. You could try an older Raspbian with a 3.x kernel to see if it is any different? (I have been using 4.4, 4.13, 4.15 on my PC).
@gvdw
@DBMandrake your audio glitches sound exactly what I encounter on my PC using a basic USB audio DAC plugged into USB 2 port - random clusters of glitched audio (usually about 30 seconds of clear audio then it starts glitching). So this issue is not specific to raspberry pi or ARM and the comment by @musicwonder that it worked on one x86 platform therefore it's ARM-specific isn't right.
As I mentioned, not all arm devices are affected. I am running Volumio on a Cubox-i without any glitches at all using the same sound interface.
Having said that, I've recently realised that my current version of Volumio on Cubox-i is using a much older 3.14 kernel. One of the Volumio devs is working on a Cubox-i build based on a 4.x kernel and, surprise surprise, is seeing problems with glitches and dropouts with USB audio.
Which makes me wonder, is this a more general problem with USB Audio in 4.x kernels and the only reason my Cubox-i is glitch free is because it is still running 3.14 ?
It would be interesting to test a Pi on an old 3.14 kernel however this is becoming difficult as many of the newer Pi models were released after the 3.x kernel series was no longer being used so a 3.x kernel could only be booted on old Pi hardware. Might be worth someone trying though.
Pre-emptive kernels or ones running a faster kernel timer do not help. I even have a BIOS option for Isochronous support and flipping that doesn't change anything. Using usbmon I can see there is a very slightly different message passed only at the exact time of the crackle, but I was not able to figure out what the messages meant, other than it LOOKS like a status flag showing a negative number rather than zero (in crackle-free periods).
What Linux distro are you running ? Is there any way you could try compiling an old 3.14 kernel for it to see if you still have glitches/dropouts ? That may be difficult with a modern distro especially if it is running systemd, but also might be worth a try.
Hi I have tried several mpd setup including Volumio Rune and Moode on Pi and BBB and at one moment the sound got worse on theses glitches with same USB device. So I subscribe to the fact that from a moment the version of OS was not okay. But I have no track of the precise moment/version it happened. BR
What Linux distro are you running ?
@DBMandrake I'm on Mint 18.3, I tried a 3.19 kernel and a 3.14 and both still caused me dropouts via USB2.
For anyone who's interested in digging into USB transfers see here for the usbmon output - look for the "-63" on the middle line it seems to be a status flag in the callback from the DAC indicating something went wrong.
I have exactly the same problem. Except, gstreamer is responsible for the pops (this pipe gst-launch-1.0 --gst-debug=3 playbin uri=file:///test/vid.mp4) and not node. I also checked it with strace, but did not find many cacheflushes. CPU is also not really used. I suspect it's the USB bus.
I have worked out on my PC that it only crackles when I have software playthrough enabled (meaning 1x isochronous transfer in each direction, to monitor a recording). With just the recording transfer and no playback going, I get no dropouts.
Since the Pi has Ethernet and USB sharing the same controller it may be a problem of limited resources
Ok, can confirm the problem is related to the USB (driver). I bought a cheap new USB audio interface which has a bInterval of 10 instead of 1 and it works flawlessly now. This really seems to be an issue. On Windows, I can configure a "USB buffer" with my old interface. Is this somehow also possible on Linux? Can I somehow increase this bInterval without writing an own driver? :)
Is there an upstream bug that I can track?
I've not found one yet.
@pelwell, You mentioned that "There is some hope that the glitching is due to a specific change in the handling of softirqs - Linus is aware and they are working on a fix." so I assume there should be an upstream bug report? If not, please create one
Let me rephrase that: In the few minutes I spent trying to find the Linux mailing list conversation, I failed to find it. Perhaps you have more time, and will have more luck finding it. If you do, please report back here.
I'm not able to find one too, and that's why I was asking. I think it's better to create a new bug report, otherwise this bug would never get fixed.
I've refound a thread on the subject: https://patchwork.kernel.org/patch/10150457/
The next step is to find softirq patches in newer kernels, in case it has already been fixed. But this bug is big enough that it is not going to be forgotten.
I'm not sure if it ever get fixed. As far as I tested the issue is still present in 4.19.17 (I update via rpi-update using the next branch).
@volumio Just tested what you tried --- with real time kernel and set cpu affinity and priority to 55. Still a lot of pops, especially with a DSD512 test file. On my low end PC things run very smoothly.
just curious, how does Volumio Primo with a tinkerboard solve this issue? Or does it just not appear when using i2s?
Yes in I2s no problem for me though i cannot use my WaveIO usb gateway
I've refound a thread on the subject: https://patchwork.kernel.org/patch/10150457/
That thread is about this commit so it's been in newer kernels for a while.
@alastair-dm I tried version with the commit you mentioned but the problem is not solved.
Sorry, I should have been clearer. Based on that commit it appears to me that the patch has been in the Raspbian kernel for a while, so anything above 4.14.64 should include it. I think Volumio tracks the Raspbian kernel, so should be including the patch too, but the audio glitches persist under certain circumstances. I haven't built a kernel from source to be completely certain though.
The commit is also useful in pointing out the timeline for softirq changes, and the specific commits involved. If we can be clear whether the initial change in 4.9 started the problem we'll have a solid basis for an upstream bug report.
@alastair-dm I have a DSD512 file and a DAC that supports DSD512 and can reproduce the issue very easily --- when it occurs the glitch is very evident. If there's a easy way I can roll back to previous versions of kernel images I can do a binary search and find when did the problem occurred.
@Wang-Yue if I knew an easy way I'd be testing it myself ;-) If you can find old kernel packages that would probably be easiest, but I suspect you'll need to check out source from git and compile your own.
@alastair-dm compiled binaries could be found in https://github.com/Hexxeh/rpi-firmware (which is a mirror of the official compiled kernel)
essentially you just need to do a binary search on the commit history and copy the kernel7.img to /boot and modules files to /lib.
the script https://github.com/Hexxeh/rpi-update may make the process easier but I didn't look into it closely.
I don’t think rpi-update is compatible with volumio.
Hi,
I'm testing a Behringer U-Phoria UMC204HD USB sound interface with my Pi 2B running the latest version of Volumio and I'm experiencing random glitches/dropouts in the sound which after some investigation I suspect may be related to USB packet loss or timing - issues that have been looked at in the past for the Pi.
The interface supports up to 192Khz 24bit with 4 output channels, however I'm seeing the same symptoms down at 44Khz 16 bit - which is a pop or momentary dropout in the audio at random roughly every 3 to 10 seconds, whether playing audio via Volumio or directly using aplay from the command line. The sample rate makes very little difference to the frequency of occurance of the pops and clicks.
I've already experimented with the various dwc options discussed elsewhere and nothing seems to make much difference - the only one that made some difference was dwc_otg.speed=1 to disable hi-speed mode, which did significantly reduce the frequency of the glitches but did not eliminate them, and of course that is not a viable solution since it affects Ethernet speed etc...
These are the current cmdline options and kernel version used by Volumio that I am testing with:
splash quiet plymouth.ignore-serial-consoles dwc_otg.lpm_enable=0 dwc_otg.fiq_enable=1 dwc_otg.fiq_fsm_enable=1 dwc_otg.fiq_fsm_mask=0x3 console=serial0,115200 kgdboc=serial0,115200 console=tty1 imgpart=/dev/mmcblk0p2 imgfile=/volumio_current.sqsh elevator=noop rootwait smsc95xx.turbo_mode=N bootdelay=5 logo.nologo vt.global_cursor_default=0 loglevel=0
Linux version 4.9.36-v7+ (dc4@dc4-XPS13-9333) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611) ) #1015 SMP Thu Jul 6 16:14:20 BST 2017
For a comparison I installed the Cubox version of Volumio on a dual core Cubox-i that I have which is a comparable speed to a Pi 3, (so a bit faster than my Pi 2) and the interface works perfectly with both Volumio and aplay with no dropouts whatsoever, so I think this rules out a problem with the interface being supported properly in Linux on arm devices.
Here is the output from lsusb:
lsusb.txt
If there is anything else I can do to test or help debug this please let me know.