Closed stephenstarkie closed 6 years ago
It looks like you are using SPI2 (it would have been nice of you to mention that). How have you enabled that interface?
The AUX SPI interfaces are inferior to SPI0, with smaller FIFOs and no DMA - is there a reason why you aren't using it?
One of the changes in 4.9 was the addition of an AUX IRQ driver, but it seems unlikely that would cause a catastrophic failure like this. However, that change could be reveted with a DT overlay if necessary. Do you have a logic analyser to see what is happening on the bus?
Apologies, I didn't realise it would make a difference (still relatively new to this); I have added something to the title to make it clearer. I have enabled spi2 using;
dtoverlay=spi2-1cs
in \boot\config.txt
The chip select line is on GPIO pin 43 so no parameters on the overlay.
The AUX SPI interfaces are inferior to SPI0, with smaller FIFOs and no DMA - is there a reason why you aren't using it?
I must say I don't know why we aren't using SPI0 (a choice made by an electronics engineer; who is on holiday at the moment, unfortunately!); though we do have a touch screen on it.
However, that change could be reveted with a DT overlay if necessary.
OK, that might be worth trying, how would I go about doing that? I have read up on overlays and how to compile them but am not very familiar with writing them. Or is there a parameter that I can use for this?
Do you have a logic analyser to see what is happening on the bus?
I will see what we can do.
An overlay like this might be enough:
/dts-v1/;
/plugin/;
/ {
compatible = "brcm,bcm2835", "brcm,bcm2708", "brcm,bcm2709";
fragment@0 {
target = <&aux>;
__overlay__ {
interrupts;
};
};
fragment@1 {
target = <&spi1>;
__overlay__ {
interrupt-parent = <&intc>;
interrupts = <1 29>;
};
};
fragment@2 {
target = <&spi2>;
__overlay__ {
interrupt-parent = <&intc>;
interrupts = <1 29>;
};
};
fragment@3 {
target = <&uart1>;
__overlay__ {
interrupt-parent = <&intc>;
interrupts = <1 29>;
};
};
};
Save it as aux-spi-revert-overlay.dts
, compile and install with:
dtc -@ -I dts -O dtb -o aux-spi-revert.dtbo arch/arm/boot/dts/overlays/aux-spi-revert-overlay.dts
sudo cp aux-spi-revert.dtbo /boot/overlays
Don't worry about the warnings from dtc.
That ought to disable the new AUX interrupt controller driver and make the individual AUX block drivers share the AUX interrupt directly.
You can check if the overlay applies correctly using sudo vcdbg log msg
. Running:
pi@raspberrypi:~$ grep 59: /proc/interrupts
should return:
59: 583 0 0 0 ARMCTRL-level 61 Edge serial
if it works (ignore the second number which varies), and:
59: 3569 0 0 0 ARMCTRL-level 61 Edge bcm2835-auxirq
if it doesn't.
I tried what you suggested about interrupts; sudo vcdbg log msg
shows that the new overlay is loaded, but when I run grep 59: /proc/interrupts
I get nothing.
cat /proc/interrupts
produces;
CPU0
17: 20 ARMCTRL-level 1 Edge 2000b880.mailbox
18: 2 ARMCTRL-level 2 Edge VCHIQ doorbell
27: 3892622 ARMCTRL-level 35 Edge timer
40: 0 ARMCTRL-level 48 Edge bcm2708_fb dma
42: 0 ARMCTRL-level 50 Edge DMA IRQ
44: 0 ARMCTRL-level 52 Edge DMA IRQ
45: 0 ARMCTRL-level 53 Edge DMA IRQ
48: 0 ARMCTRL-level 56 Edge DMA IRQ
53: 120 ARMCTRL-level 61 Edge 202150c0.spi
56: 15577561 ARMCTRL-level 64 Edge dwc_otg, dwc_otg_pcd, dwc_otg_hcd:usb1
73: 0 ARMCTRL-level 81 Edge 20200000.gpio:bank0
74: 0 ARMCTRL-level 82 Edge 20200000.gpio:bank1
77: 1327 ARMCTRL-level 85 Edge 20205000.i2c, 20804000.i2c
78: 6 ARMCTRL-level 86 Edge 20204000.spi
81: 713 ARMCTRL-level 89 Edge uart-pl011
86: 269926 ARMCTRL-level 94 Edge mmc0
195: 0 pinctrl-bcm2835 35 Edge ads7846
FIQ: usb_fiq
Err: 0
I have also looked at the signals using a logic analyser. There seems to be a problem with the camera FPGA sending back what should be a 1 but is being interpreted as a 128 or 0. However; I am seeing odd things coming from the FPGA, so before I add plots to this thread I would like to understand that a bit better as well. I will add plots once I understand them.
OK - the assigned interrupt number (the first column) varies - it's the 4th column that matters. Try grep " 61 Edge" /proc/interrupts
to see the difference.
So; yes it appears the overlay is being applied and stopping the interrupts. Without the overlay I get;
53: 0 ARMCTRL-level 61 Edge bcm2835-auxirq
With the overlay, I get;
53: 0 ARMCTRL-level 61 Edge 202150c0.spi
but both seem to give the same in terms of image quality coming back from the camera over SPI 2.
We tried a sanity check and moved everything to SPI0 on a development board with the same camera PCB. Initially this showed the problem as well. The logic analyser showed that we should be using rising edges on MOSI and falling edges on MISO, so I tried mode 1 on SPI0 and that fixed it. So, it appears there was a problem with our code (sorry about that!) However, we gleefully moved back to SPI2 with mode 1 and found the same problems (which stopped short our celebration!).
The following is a plot from trying to send a b0 to the FPGA (using python with mode 0) on 4.4, SPI 2.0 which should return the version number; 1, and does consistently (with any other mode the plots look very wrong, as expected);
However, this is the plot doing the same on 4.9 (mode 0 or 1), which returns 0 or 128 (generally 128 when the FPGA leaves the MISO line high after transmission and 0 when it leaves it low (I'll talk to the FPGA code author about that, or learn some verilog and fix it!);
So, it looks like there is something up in kernel 4.9, but slightly different from first thought. Of course, my assumption is that its my code
Interesting. I'm not an SPI expert, but my interpretation of those traces is as follows:
polarity/CPOL=0
.phase/CPHA=0
.phase/CPHA=1
.I don't know about other SPI controllers (and the SPI standard), but the controller in the Pi SoCs has a common CPHA for MOSI and MISO, so your trace is going to be problematic. Sampling (or changing) on the wrong edge would account for the 0x00/0x80 results. Are you using the same CPOL/CPHA for host and device?
And an observation - the CE/CS signal is being deasserted between the address write and the data read. Have you have set the cs_change flag to 1? This is rarely what you want - an older version of SPI-Py did this and caused problems with an RFID reader. See https://github.com/raspberrypi/linux/issues/1547#issuecomment-230170202 for an explanation.
None of this really accounts for the kernel-related differences, but it is questionable.
Thanks for all the help so far. Andy (electronics) is back (I think you might know him!) and is going to pick up the FPGA code; so we could probably change it to work with the same CPOL/CPHA on MOSI and MISO. I also changed the code to clock out and back chunks of data which means we don't de-assert CE/CS between write and read. I'll report the findings here.
@stephenstarkie Do you have any results? Is this issue now able to be closed?
We are still trying to sort this one out. We think changing the FPGA code will sort it and we think that probably the raspberry pi SPI behaviour is correct. However there are a couple of things we are still checking. I hope we can bottom out and close this in the next day or so
Thanks for the update, please close or update once you are ready.
@stephenstarkie Any progress on this?
In the end we moved to SPI0, because we also moved the screen controller (new screen) to i2c, so our problem is resolved
We have a camera module on our own custom PCB connected via SPI (and i2c) to Pi CM 1's. Previously, we were getting good images from the camera; . Since updating the kernel ( 4.4.13+ to 4.9.35+ ) we now seem to get noisy images; I am running the same C++ code side by side on two identical systems (apart from kernel update which worked before the update). I have seen mention of other issues with SPI and have tried the suggestions for zeroing the spi struct but it doesn't seem to help here. I have also tried python code to get the images and have the same problem. Also; we have an FPGA on the PCB from which I can get the version number using;
Running this on the 4.4 kernel always produces 1 (the correct version), while on the updated kernel it sometimes returns 0 and sometimes 128.