Closed msperl closed 8 years ago
@msperl I'll take a look this evening. Thank you.
One thought. Period length can be >65532.... So what value do you suggest for ".period_bytes_max" in bcm2835-i2s.c?
I have no experience with I2S, so you are on your own - try first the ones you have been using before and then test also with higher values to see if the code is working properly and does not produce those glitches that @HiassofT was talking about - at least the standard settings upstream did not show anything strange and I was using I2S sound and SPI framebuffer with mplayer to test that things worked well together - which it did, but there was one strangeness: mplayer sound finished earlier than video, so there might be a bug lurking somewhere, but the sound reproduction was glitch-free (at least when I listened in I could not hear anything abnormal)
Maybe @HiassofT wants to make the period_bytes_max a module parameter that you can set to avoid recompiling the module with different settings...
@msperl thanks a lot for the patches, I'll do some tests with my Cirrus audio card and your upstream code.
BTW: I noticed that in the 4.4.0-rc6-dmaengine-split tree you have some additional clk and spi patches, but not your I2S (test-) patch. Is that intentional?
@clivem use period_bytes_max set to 256k for testing, you should be able to set lower values in your sound apps.
By default arecord and aplay try to use a 500ms buffer split into 4 periods. Use "-v" to view the actual values and "--buffer-size", "--period-size" options to change them. The unit of these parameters is "frames". One frame is bytes_per_sample*channels bytes (i.e. 1 frame is 4 bytes for 16-bit stereo, 8 bytes for 32-bit stereo).
For example if you want to use a 40ms buffer split into 2 20ms periods when playing a 44.1kHz stereo WAV use this:
aplay -D hw:0,0 -v --buffer-size=1764 --period-size=882 file.wav
Most of the time small buffers/periods are preferable, to reduce the latency. One exception is when recording audio, then you might want to set the buffer as large as possible to reduce the risks of dropouts (eg when your system is busy for some reason). Also playing back audio from network or sources with possibly varying bandwith can benefit from large buffers.
In those corner cases there might be a small benefit from having period_bytes_max set to 256k instead of 64k-4: arecord/aplay will then use a buffer up to 512k split into 4 periods up to 128k instead of a ~256k buffer split into 4 ~64k periods by default for high bitrate audio. You have to use at least 88kHz @ 24 or 32bit or 176kHz @ 16bit stereo files to run into these limits.
As for periods: we'll need at least 2, 4 (or more) are usually preferrable as they also reduce the risk of over/underruns (application gets woken up more often).
@msperl Thanks! 2836 variant is playing music. Just spinning a 2835 kernel.....
@HiassofT 64 k is probably already overkill for max period. I think even in my worse case scenario, using USB audio device and outputting 384k/32 bit data, I'm only using ALSA buffer of 16384 and period of 4096 bytes for glitch free audio! ;)
Anyway, once 2835 variant kernel has finished building, I'll roll this out across all my "music players".
the "workarround" patch is actually a module parameter where you have to pass the register address. And to get that you need to patch the clock-framework to print the address first.
That is a hack that I will not even start to share - I just wanted to confirm that the DMA portion works. Clock support in the driver itself is a bit further off - my main focus (and my main interest) is still the DMA portion, so that comes first - the clock will have to wait, but I can look into it, but as said: the point here is that the use of the clock has a few more limitations on disabled interrupts which will need to get addressed - similar to the issue why we started to use dmabuf to avoid those warnings...
Note that in the end you just have to pick up the source of the module - no need to patch really for a quick test...
@msperl ..... Just noticed this ..... msperl_dma_backtrace.txt
Can you give me a way how you have reached this - Command-line maybe, so that I can try to replicate the issue? I guess it may be related to multi-processor, but strangely in CMA_ALLOC... We may need to use a different GFP flags... Maybe something like:
/* allocate the CB chain */
d = bcm2835_dma_create_cb_chain(chan, direction, true,
info, extra,
frames, src, dst, buf_len,
period_len, GFP_ATOMIC);
(that is line 712)
The original code uses a mix of GFP_NOWAIT, GFP_KERNEL and GFP_ATOMIC:
d = kzalloc(sizeof(*d), GFP_NOWAIT);
if (!d)
return NULL;
d->c = c;
d->dir = direction;
d->frames = buf_len / period_len;
d->cb_list = kcalloc(d->frames, sizeof(*d->cb_list), GFP_KERNEL);
if (!d->cb_list) {
kfree(d);
return NULL;
}
/* Allocate memory for control blocks */
for (i = 0; i < d->frames; i++) {
struct bcm2835_cb_entry *cb_entry = &d->cb_list[i];
cb_entry->cb = dma_pool_zalloc(c->cb_pool, GFP_ATOMIC,
&cb_entry->paddr);
So this is a mess in itself and the "common" setting is GFP_KERNEL, as this says: we do not care if we are delayed for one allocation, so we really should not care for any waits. This may not be optimal, so the switch to GFP_ATOMIC may solve the issue when using multiple processors...
@msperl Sorry Martin, will have to get back to this later this evening. Still working and I need to sort out a client.....
NB. Didn't notice anything untoward with the BCM2708 kernel build. That's running a WM8804 HAT, outputting TOSLINK to a digital amp.
NP - please just give it a try with GFP_ATOMIC and tell me how you are able to (re)produce it from the command-line.
Reading this with interest but not sure how I can help. One thing to note Toslink is normally 96k max so maybe not stretching the DMA throughput etc to the limit.
G
On 6 Jan 2016, at 17:21, clivem notifications@github.com wrote:
@msperl https://github.com/msperl Sorry Martin, will have to get back to this later this evening. Still working and I need to sort out a client.....
NB. Didn't notice anything untoward with the BCM2708 kernel build. That's running a WM8804 HAT, outputting TOSLINK to a digital amp.
— Reply to this email directly or view it on GitHub https://github.com/raspberrypi/linux/issues/1231#issuecomment-169394399.
@iqaudio I have a good stash of original Toshiba TX/RX parts, before they became hard to come by! ;) Don't know what the Berry people are using on their Digi board, but the Geekroo chaps are definitely using 192k capable TOSLINK transmitters. Do you make a digital transport hat, or just the DAC boards?
DAC and AMPs but have had wm8804 Optical board in the wings for a while - may release soon. Do you need samples @clivem
@msperl I have one hand tied behind my back, trying to do things without being in the same physical location as the devices..... But I think you hit the nail on the head! GFP_ATOMIC on that alloc, resolves it. I couldn't reliably reproduce. But haven't been able to reproduce at all since .....
msperl_dma_0016_GFP_ATOMIC.txt
I'll do some real testing once I return home.
Thanks a lot for your testing... Hopefully you do not experience more issues!
@iqaudio You are already represented with 2 devices in my "I2S test farm".... One of your first generation DACs on a B and your most recent (B+ and later) PCM512x design, mounted on a 2B. I will update them to the latest msperl dma code tonight. A HiFiBerry Digi+ and Geekroo Digi+ are the "test" units for digital transports, but I can make space for another. ;) I did try and prise a Zero DAC sample out of you before Christmas by email, but never followed-up with the "threatened" phone call. Sorry! Never enough hours in the day......
A few quick tests (playback at various rates, recording at 44/16, simultaneous playback and recording at 44/16) with my Cirrus card on the upstream kernel went fine!
In case anyone would like to test here's the tree I used (defconfig and DT for RPi B rev2 included): https://github.com/HiassofT/rpi-linux/tree/upstream-dmatest-cirrus
I used GFP_ATOMIC allocation for cyclic DMA but I'm wondering why the allocation flags were changed in this patch. Upstream uses GFP_NOWAIT for the descriptor, GFP_KERNEL for the cb_list and GFP_ATOMIC for the control blocks. ping @msperl
GFP_KERNEL looks like a small bug in the dmapool patch, I guess this should be GFP_NOWAIT as well, but I'm not sure if GFP_ATOMIC is really needed for the control blocks or if we can use GFP_NOWAIT here as well.
Why gfp_kernel?
Because the original patch used all 3 variants and the gfp_kernel is the most permissive of all. Gfp_atomic is the least permissive.
Ideally only a single type should be used in a single method not 3 distinct types! So I assumed that everything was working fine with the original patch (as nobody ever reported issues), which meant that the most permissive method was sufficient, so I have generalized.
But seems as if we are falling into the same trap as with the freeing of coherent_dma memory with interrupts disabled, where we had to move to dma pool to work around those semantics of alsa.
Now moving to gfp_atomic will allow this method to work in any kernel context (even interrupts) so we should be fine - any other option may only show up this error in rare circumstances as an error - the only solution would be to fully understand the alsa code better to see if gfp_nowait would be sufficient. But that is again a risk and I want to avoid those.
So I will update the patch set in such a way that we use gfp_atomic and I will add some comments on the why it is used - this may result in some feedback from people more knowledgeable and may require another version, but I guess that should be fine...
Anyway: as so far the experience is positive with this patch-set I will send it off to upstream for review.
@msperl I'm pretty sure that it isn't reproducible with GFP_ATOMIC, having found a way to reproduce it at every boot with GFP_KERNEL. (Unfortunately, not an easy cmd line way. It involves starting a service from systemd that opens and starts outputting to the sound device, while boot process is still spinning-up other services. ie. CPU is fairly loaded and there is work on all 4 cores.)
Perhaps in a day or two, as well as having sent the code for review upstream, @pelwell and @popcornmix will take a pull request, for those 15 patches, (or 16 with GFP_ATOMIC), from your tree, into downstream rpi-4.4.y tree. IIRC, OpenELEC is using the rpi-4.4.y tree, which would likely see the code get a lot more testing than it would do otherwise.
@msperl thanks for the explanation, using GFP_ATOMIC looks fine to me.
On a second thought using GFP_NOWAIT might be a bad idea as it has a higher chance to fail (GFP_KERNEL can block, GFP_ATOMIC has the emergency pool as a fallback).
@HiassofT - here the "ugly" patch to make sound work in upstream while using the clock framework:
diff --git a/drivers/clk/bcm/clk-bcm2835.c b/drivers/clk/bcm/clk-bcm2835.c
index 015e687..5d45457 100644
--- a/drivers/clk/bcm/clk-bcm2835.c
+++ b/drivers/clk/bcm/clk-bcm2835.c
@@ -1515,6 +1527,7 @@ static int bcm2835_clk_probe(struct platform_device *pdev)
cprman->regs = devm_ioremap_resource(dev, res);
if (IS_ERR(cprman->regs))
return PTR_ERR(cprman->regs);
+ pr_info("CLK: %pK %zx\n", cprman->regs, res->start);
cprman->osc_name = of_clk_get_parent_name(dev->of_node, 0);
if (!cprman->osc_name)
diff --git a/sound/soc/bcm/bcm2835-i2s.c b/sound/soc/bcm/bcm2835-i2s.c
index 8c435be..eb2dc8f 100644
--- a/sound/soc/bcm/bcm2835-i2s.c
+++ b/sound/soc/bcm/bcm2835-i2s.c
@@ -784,6 +784,9 @@ static const struct snd_soc_component_driver bcm2835_i2s_component = {
.name = "bcm2835-i2s-comp",
};
+int clk_reg;
+module_param(clk_reg, int, 0);
+
static int bcm2835_i2s_probe(struct platform_device *pdev)
{
struct bcm2835_i2s_dev *dev;
@@ -792,14 +795,24 @@ static int bcm2835_i2s_probe(struct platform_device *pdev)
struct regmap *regmap[2];
struct resource *mem[2];
+ if (!clk_reg)
+ return -EINVAL;
+
/* Request both ioareas */
for (i = 0; i <= 1; i++) {
void __iomem *base;
-
+ if (i) {
+ mem[i]=kzalloc(sizeof(*mem[i]),0);
+ mem[i]->start = 0x7e101098 - BCM2835_VCMMU_SHIFT;
+ base = clk_reg * 0x1000 + 0x98;
+ } else {
mem[i] = platform_get_resource(pdev, IORESOURCE_MEM, i);
base = devm_ioremap_resource(&pdev->dev, mem[i]);
if (IS_ERR(base))
return PTR_ERR(base);
+ }
+
+ pr_info("XXX: %i %pK %zx\n",i, base, mem[i]->start);
regmap[i] = devm_regmap_init_mmio(&pdev->dev, base,
&bcm2835_regmap_config[i]);
diff --git a/arch/arm/boot/dts/bcm2835.dtsi b/arch/arm/boot/dts/bcm2835.dtsi
index aef64de..cae531e 100644
--- a/arch/arm/boot/dts/bcm2835.dtsi
+++ b/arch/arm/boot/dts/bcm2835.dtsi
@@ -120,8 +121,8 @@
i2s: i2s@7e203000 {
compatible = "brcm,bcm2835-i2s";
- reg = <0x7e203000 0x20>,
- <0x7e101098 0x02>;
+ reg = <0x7e203000 0x24>;
+ clocks = <&clocks BCM2835_CLOCK_PCM>;
dmas = <&dma 2>,
<&dma 3>;
(do not blame me for uglyness or copy/paste issues producing spaces instead of tabs!)
here how I use it (assuming the modules have been loaded automatically):
root@raspcm:~# dmesg | grep -E "\\] (CLK|XXX):"
[ 2.738875] CLK: dc854000 20101000
root@raspcm:~# rmmod snd_soc_bcm2835_i2s; modprobe snd_soc_bcm2835_i2s clk_reg=$[0xdc854]
root@raspcm:~# dmesg | grep -E "\\] (CLK|XXX):"
[ 2.738875] CLK: dc854000 20101000
[ 1015.599867] XXX: 0 dc85e000 20203000
[ 1015.607516] XXX: 1 dc854098 20101098
After this I2S works, but it is definitely ugly...
In the meantime I am working on upstreaming those patches and when I get that done I can look into the clock issues.
I guess we will need those clock patches to support PCM, that are in my branch already:
@pelwell : your quote:
That's correct, but the driver can't cope with the fact that channels 12, 13 & 14 share an interrupt, so the list of usable DMA channels is 0, 2, 4, 5, 8, 9, 10, 11 (notice that brcm,dma-channel-mask is 0x0f35 and that there are only 12 entries in the interrupts list - channels 0-11).
Looking into the videocore header files shared by broadcom (http://www.broadcom.com/docs/support/videocore/Brcm_Android_ICS_Graphics_Stack.tar.gz) what I can find in ./brcm_usrlib/dag/vmcsx/vcinclude/hardware_vc4.h is the following:
#define INTERRUPT_DMA9 (INTERRUPT_HW_OFFSET + 25 )
#define INTERRUPT_DMA10 (INTERRUPT_HW_OFFSET + 26 )
#define INTERRUPT_DMA11_12_13_14 (INTERRUPT_HW_OFFSET + 27 )
#define INTERRUPT_DMA_ALL (INTERRUPT_HW_OFFSET + 28 )
So it seems as if even dma-channel 11 is somewhat circumspect with regards to interrupts. Imagine if one of those channels was not owned by by ARM but by the firmware - we would get interrupts for those firmware-owned dma channels on ARM without the means to do anything about it.
Fortunately this is not the case, so as of now it is safe to assume that we can enable channel 11 as well and assume it is "dedicated"...
On a side-note: the above source also indicates a few other things: dma-channel 7 has 2 variants: ./brcm_usrlib/dag/vmcsx/vcinclude/bcm2708_chip/axi_dma7.h ./brcm_usrlib/dag/vmcsx/vcinclude/bcm2708_chip/axi_dma_lite7.h similar for channel 8.
It seems as if ./brcm_usrlib/dag/vmcsx/vcinclude/bcm2708_chip/register_map.h includes "axi_dma_lite7.h", which would indicate it is a LITE channel.
So I wonder what is right? Lite channels start at 7 or at 8?
Still - channel 7 is assigned to the firmware, still we should try to get things "right" (if something ever changes)...
Is there any way to get confirmation?
You are correct - my comment should have said that channels 12, 13 & 14 share an interrupt with channel 11.
I believe that channel 7 is a Lite channel, something I've just confirmed by observing that what should have been the STRIDE register for that controller returns 'DMA7' when read.
@pelwell: thanks for the observations!
Here a initial patch that may allow the use of 11 to 14 by using shared irqs...
I am not sure how I can really test it - I do not have that many dma channels in use...
diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index e4ca980..8d7096f 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -127,6 +127,10 @@ struct bcm2835_desc {
#define BCM2835_DMA_CHAN(n) ((n) << 8) /* Base address */
#define BCM2835_DMA_CHANIO(base, n) ((base) + BCM2835_DMA_CHAN(n))
+/* these DMA channels 11 to 14 share a common interrupt */
+#define BCM2835_DMA_SHARED_IRQ_MASK (BIT(11) | BIT(12) | BIT(13) | BIT(14))
+#define BCM2835_DMA_SHARED_IRQ_USE 11
+
static inline struct bcm2835_dmadev *to_bcm2835_dma_dev(struct dma_device *d)
{
return container_of(d, struct bcm2835_dmadev, ddev);
@@ -215,6 +219,14 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
struct bcm2835_desc *d;
unsigned long flags;
+ /* check the shared interrupt */
+ if (BIT(c->ch) & BCM2835_DMA_SHARED_IRQ_MASK) {
+ /* check if the iterrupt is enabled */
+ flags = readl(c->chan_base + BCM2835_DMA_CS);
+ if (!(flags & BCM2835_DMA_INT))
+ return IRQ_NONE;
+ }
+
spin_lock_irqsave(&c->vc.lock, flags);
/* Acknowledge interrupt */
@@ -239,6 +251,7 @@ static int bcm2835_dma_alloc_chan_resources(struct dma_chan *chan)
{
struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
struct device *dev = c->vc.chan.device->dev;
+ unsigned long flags = 0;
dev_dbg(dev, "Allocating DMA channel %d\n", c->ch);
@@ -249,8 +262,11 @@ static int bcm2835_dma_alloc_chan_resources(struct dma_chan *chan)
return -ENOMEM;
}
+ if (BIT(c->ch) & BCM2835_DMA_SHARED_IRQ_MASK)
+ flags = IRQF_SHARED;
+
return request_irq(c->irq_number,
- bcm2835_dma_callback, 0, "DMA IRQ", c);
+ bcm2835_dma_callback, flags, "DMA IRQ", c);
}
static void bcm2835_dma_free_chan_resources(struct dma_chan *chan)
diff --git a/arch/arm/boot/dts/bcm2835.dtsi b/arch/arm/boot/dts/bcm2835.dtsi
index aef64de..7ce12ff 100644
--- a/arch/arm/boot/dts/bcm2835.dtsi
+++ b/arch/arm/boot/dts/bcm2835.dtsi
@@ -43,8 +44,18 @@
<1 24>,
<1 25>,
<1 26>,
+ /* dma channels 11 to 14 share irq 27 */
<1 27>,
- <1 28>;
+ <1 27>,
+ <1 27>,
+ <1 27>;
+ /* no support for DMA 15 */
+ /*
+ * interrupt 28 is the irq that
+ * triggers on any dma channel sending
+ * an interrupt - even if owned by
+ * the firmware, so it is not used
+ */
#dma-cells = <1>;
brcm,dma-channel-mask = <0x7f35>;
That's a neat patch - just let the interrupt framework do the work. You could test it by artificially clearing some of the lower bits in the channel mask to force the use of the higher channels.
BTW, there's a typo in this comment:
/* check if the iterrupt is enabled */
and it looks like you didn't end up needing BCM2835_DMA_SHARED_IRQ_USE
after all.
@pelwell: thanks for the feedback - I will incorporate those...
As for testing - I already started with the "trick" with channel-mask, but there is one drawback (for upstream) - there is no slave_sg support yet, so I can not test I2S and SPI all on channel 11 to 14.
On the positive side: this gives 11 working DMA channels!
To avoid changes to the device-tree (about which upstream typically is very picky) a slightly different approach had to be taken.
I was now running mplayer with I2S audio and SPI FB video and things were working fine. Interrupts look like this:
CPU0
27: 32711 ARMCTRL-level 35 Edge timer
33: 5282207 ARMCTRL-level 41 Edge 20980000.usb, dwc2_hsotg:usb1
51: 16804 ARMCTRL-level 59 Edge DMA IRQ, DMA IRQ, DMA IRQ, DMA IRQ
73: 0 ARMCTRL-level 81 Edge 20200000.gpio:bank0
74: 0 ARMCTRL-level 82 Edge 20200000.gpio:bank1
78: 0 ARMCTRL-level 86 Edge 20204000.spi
81: 294 ARMCTRL-level 89 Edge uart-pl011
86: 259525 ARMCTRL-level 94 Edge mmc0
Err: 0
The new patchset is available for upstream at: https://github.com/msperl/linux-upstream/commits/4.4.0-rc8-dmaengine-bcm2835
The patch for shared interrupts is: https://github.com/msperl/linux-upstream/commit/3f2645386906245e2d6aa70c09155e9d9d50a297
The patchset has been sent to the corresponding lists for review - let us see what the feedback is. @HiassofT you have been explicitly included as a recipient in the emails (I do not know if you are subscribed to the rpi list or not)
@msperl I noticed you had added a comment to the code submitted to LKML in patch 6/8, "note that we need to use GFP_ATOMIC, as the ALSA i2s dmaengine implementation calls prep_dma_cyclic with interrupts disabled".
I've continued testing with a particular focus on I2S audio and haven't come across any issues. I've had music playing back on 12 devices using I2S output since last night. (Upsampling to 192k when possible to put the most number of bits through the dma impl.) I have a few more "players" to migrate to 4.4 kernel build with your DMA code, but all looks good to me from an I2S audio user perspective.
Thanks for testing - what would help is if you can send some feedback to those patches in the form of:
Tested-By: Clive M... <your@email>
maybe with a comment on the setups you have tested - I guess a comment on the cover letter is sufficient.
I will try to create a new patchset for 4.4 downstream in the near future.
I will also start looking into using the clk framework for the bcm2835-i2s driver itself now...
Note that I got a version of bcm2835-i2s that is using only the new clock framework (with the clock-PWM patches by @repk, which are supposed to be getting into 4.5).
There is one regression: the clock framework does not (yet) support the MASH functionality, so there may be slightly more jitter than necessary...
See the last 3 patches in the above mentioned branch: https://github.com/msperl/linux-upstream/commits/4.4.0-rc8-dmaengine-bcm2835 (when the push is finished)
@msperl sorry for the delay, I don't have too much spare time ATM
I just did some more tests with the 4.4.0-rc8-dmaengine-bcm2835 branch plus the Cirrus driver and everything looks fine!
One of these tests was playing a 192kHz 32bit WAV using 256k period / 512k buffer, 15360 bytes period / 30720 buffer (10ms period time) and 153600 period / 307200 buffer (100ms period time) and monitoring /proc/interrupts. The interrupt count was as expected.
I also traced the wakeup position (aplay --test-position) and it was fine and stable as well.
For these tests I had the clk framework disabled in DT so I could use bcm2835-i2s without clock hacks.
Next thing I'd like to test is with the clk framework enabled. With the Cirrus card the clk code will be almost a no-op though because it uses external clocking. I'll see if I can get my iqaudio card working, this one uses clocking from bcm2835-i2s.
Note that I got a version of bcm2835-i2s that is using only the new clock framework (with the clock-PWM patches by @repk, which are supposed to be getting into 4.5).
@msperl did you push these changes yet? Just did a git fetch and saw no new commits or branches.
Good to hear that all tests worked...
For some reason git complained about updates on github and asked for merges...
So here a new branch: https://github.com/msperl/linux-upstream/commits/4.4.0-rc8-dmaengine-bcm2835+i2s_clk
@msperl Martin, thanks for your work. I'll re-spin my test downstream kernels based on your latest 4.4.0-rc8-dmaengine-bcm2835+i2s_clk. I'll also fire a message at LKML later this evening with a Tested-By, and document what I have actually tested.
note that the i2s_clk will not apply downstream, as downstream does not use the new clock framework, so all those patches do not apply downstream until we use the new clock framework downstream...
Clive, can you PM me please :-)
G
On 6 Jan 2016, at 19:11, clivem notifications@github.com wrote:
@iqaudio https://github.com/iqaudio You are already represented with 2 devices in my "I2S test farm".... One of your first generation DACs on a B and your most recent (B+ and later) PCM512x design, mounted on a 2B. I will update them to the latest msperl dma code tonight. A HiFiBerry Digi+ and Geekroo Digi+ are the "test" units for digital transports, but I can make space for another. ;) I did try and prise a Zero DAC sample out of you before Christmas by email, but never followed-up with the "threatened" phone call. Sorry! Never enough hours in the day......
— Reply to this email directly or view it on GitHub https://github.com/raspberrypi/linux/issues/1231#issuecomment-169423821.
@msperl got my iqaudio dacplus card working with upstream+dma+clk
The 44.1kHz rate family works fine, but when using rates from the 48kHz family the pitch is too high. eg test with:
speaker-test -c 2 -r 48000 -F S16_LE -f 440 -t sine
(that should output a 440Hz sine wave).
At 192kHz S32_LE I get no audio output at all and when stopping speakter-test or aplay with ctrl-c I get an
bcm2835-dma 20007000.dma: DMA transfer could not be terminated
176.4kHz S32_LE works fine, as does 192kHz S16_LE. Maybe an overflow somewhere?
BTW: here's my test branch with defconfig and DT files https://github.com/HiassofT/rpi-linux/tree/upstream-dma-clk-test
@msperl Yes, understood! Would probably be better if you have the time to "consolidate" any changes back into that rpi-4.4.y-new-dmaengine branch....... I'd like to see this merged into downstream rpi-4.4.y....... I suspect a direct pull request from your branch might expedite that. ;)
Just to be sure another test: 192kHz/32bit works fine with the iqaudio card on the downstream 4.4 rc7 kernel
Second that..... I already fairly comprehensively tested iqaudio DAC+ with rpi-4.4.y-new-dmaengine kernel at all (>=CD quality) sample rates and bit depths combinations.... 44k1,48k,88k2,96k,176k4,192k / S16_LE, S24_LE, S32_LE.
I would be interested in someone doing some independent tests on the upstream patch: https://github.com/msperl/linux-upstream/commits/4.4.0-rc8-dmaengine-bcm2835+i2s_clk
If there is no negative feedback, then I will send out the patchset for bcm2835-i2s driver as well.
These seem (to me) candidates for 4.5, so @popcornmix + @pelwell : this may be the chance to move to the new clock framework for downstream as well with 4.5 - this i2s issue (as far as I understand) is the biggest hurdle in taking those steps... (but note: I did not check for any other changes to this specific driver that have been applied downstream - these I would also recommend to upstream).
As for the "mash" clock solution - I guess this will have to wait for a while - I will not be able to get something ready before I go on vacation (but I had a look already and it did not look that complicated). Also I want to see some progress on the existing patch-sets first before continuing here...
Downstream is always keen to switch to using upstream components if we can do that without breaking things. Anything you do that makes that possible is welcome.
OK - I have sent of the patchset to upstream the clock patches for I2S as well. Let us wait now what will happen and when it will go in.
The 48kHz frequency mismatch when using the clock framework is still puzzling me. It should be spot on, as it can be cleanly derived from the 19.2MHz base clock - but actually it's far off.
I traced the clock setting in the I2S hwparam code, bclk_ratio with the iqaudio card is 32 and the clock frequency should be 1.536MHz (19.2MHz / 12.5). But with my scope I measured 1.6MHz (19.2MHz / 12).
Is the fractional part of the divisor maybe lost somewhere?
@msperl could you check this with your setup?
That is possibly a bug in the clock driver, but it may also be the missing "mash" support. If you can tell me how I can reproduce it with a hifiberry (from a he commandline) then I will try to investigate it.
I guess we are the first consumer of this new API, so some bugs are to be expected (unfortunately)
Use the speaker-test command I posted a few comments back. With -r 44100 output is fine, with -r 48000 the pitch is too high. Or play a 48kHz audio file.
I'm currently suspecting the clock divisor roundup commit in your tree might be the culprit, the unused_frac_mask >>1 line looks fishy. Can't test right now, I'm on the road.
I have to admit I have not seen your comments about the clock issue earlier...
So I added a bit of debugging for the clocks:
diff --git a/sound/soc/bcm/bcm2835-i2s.c b/sound/soc/bcm/bcm2835-i2s.c
index 794ada5..bea3ba0 100644
--- a/sound/soc/bcm/bcm2835-i2s.c
+++ b/sound/soc/bcm/bcm2835-i2s.c
@@ -277,6 +277,14 @@ static int bcm2835_i2s_hw_params(struct snd_pcm_substream *
/* set target clock rate*/
clk_set_rate(dev->clk, sampling_rate * bclk_ratio);
+ /* dump the rate and parent */
+ pr_info("Clock rate requested: %d (= %d Hz * %d)\n",
+ sampling_rate * bclk_ratio,
+ sampling_rate, bclk_ratio);
+ pr_info(" set: %pCr Hz\n", dev->clk);
+ pr_info(" parent: %pC (%pCr Hz)\n",
+ clk_get_parent(dev->clk), clk_get_parent(dev->clk));
+
/* Setup the frame format */
format = BCM2835_I2S_CHEN;
And this is what I get as output when running first 44.1k and then 48k:
[18002.333636] Clock rate requested: 2822400 (= 44100 Hz * 64)
[18002.343906] set: 2822398 Hz
[18002.352618] parent: pllc_per (999199987 Hz)
[18010.279344] Clock rate requested: 3072000 (= 48000 Hz * 64)
[18010.289332] set: 3072000 Hz
[18010.297745] parent: osc (19200000 Hz)
So those "calculated" frequencies are (close to) what we requested. Maybe the issue lies somewhere in the calculation and recalculation of the values. I will instrument the clock driver now...
After instrumenting clk-bcm2835:
diff --git a/drivers/clk/bcm/clk-bcm2835.c b/drivers/clk/bcm/clk-bcm2835.c
index 5d45457..7743326 100644
--- a/drivers/clk/bcm/clk-bcm2835.c
+++ b/drivers/clk/bcm/clk-bcm2835.c
@@ -1287,6 +1287,9 @@ static int bcm2835_clock_set_rate(struct clk_hw *hw,
const struct bcm2835_clock_data *data = clock->data;
u32 div = bcm2835_clock_choose_div(hw, rate, parent_rate, false);
+ pr_info("set_rate rate = %lu, parent_rate = %lu - div = %8x\n",
+ rate, parent_rate,div);
+
cprman_write(cprman, data->div_reg, div);
return 0;
I get this:
[ 62.858205] set_rate rate = 2822398, parent_rate = 999199987 - div = 162067
[ 62.873508] Clock rate requested: 2822400 (= 44100 Hz * 64)
[ 62.883257] set: 2822398 Hz
[ 62.891619] parent: pllc_per (999199987 Hz)
[ 100.695595] set_rate rate = 3072000, parent_rate = 19200000 - div = 6400
[ 100.706912] Clock rate requested: 3072000 (= 48000 Hz * 64)
[ 100.716660] set: 3072000 Hz
[ 100.724995] parent: osc (19200000 Hz)
So for:
So I wonder where the issue comes from, because clock dividers are working as expected!
I fear I may have forgotten to translate some special cases code in the original code, which results in the wrong frequency getting requested in the first place...
Note also that the original driver was using PLLD running at 500MHz not PLLC - so there may be a stability issue with PLLC running at 1GHz. This could mean that actually the 44.1k sample is off key and not the 48k sample.
One other thing is that the Peripheral Documentation on page 105 talks about a divider of 1024 for average output frequency not 4096, as we would expect for 12 bit - so this may be an errata or it may point to something else we need to understand.
I will check tomorrow with an oscilloscope what the real frequencies are.
OK, so I have measured 10 440Hz sine waves
I also measured the length of 10 digital clock cycles:
So this means that PLLC is not that stable - need to ask Eric about this...
I am trying to upstream drivers/dma/bcm2835-dma.c - especially the slave-portion.
One of the thing that turns up is that upstream wants changes to the code, which I can do.
The question is: how would it filter back to this branched code how can I help making that work smoothly?