raspberrypi / pico-sdk

BSD 3-Clause "New" or "Revised" License
3.64k stars 903 forks source link

SPI Master and Slave mode communication issues between 2 Pico's #1116

Open chrisckc opened 1 year ago

chrisckc commented 1 year ago

During the transfer of data between 2 Pico's wired together as SPI master and slave devices, I start to see data corruption after a few hundred or so data transfer operations. Once the data corruption starts, it continues until the Pico's are reset or power cycled.

I was originally using Arduino-Pico so I switched to pure SDK to confirm the issue still existed. I created a test harness based on the master-slave example provided here in the Pico examples Repo: https://github.com/raspberrypi/pico-examples/tree/master/spi/spi_master_slave

The test harness is here: https://github.com/chrisckc/TestHarness-SPI-Pico-SDK.git I modified the master-slave example to improve the serial output and added a separate, single byte, SPI data transfer before the 256 byte buffer is transferred, to match the scenario which I was originally working on.

Further, to match my scenario, I also changed the SPI configuration to use Mode 1 and also the spi1 instance on the master and spi0 instance on the slave, specifying different pins. The circuit is wired as per the master slave example readme, but using the alternative SPI pins. One of reasons for using Mode1 is that the CS line is held low for the duration of multi-byte transfers rather than being toggled after each byte sent as in Mode 0, this appears to be a quirk of the PL022 used by the Pico rather than something specified a difference between Mode0 and Mode1 in the SPI standard. Holding CS low is more efficient as there are no gaps in the clock signal.

I also increased the send rate from 1 per second to 10 per second.

In order to properly view the serial output using my test harness code you will need to use a proper terminal emulator, ie. one that supports ANSI control characters. I use iTerm2 on MacOS with a command to launch screen against the usb tty.

What the test harness code does:

The Master Pico sends a single value to the Slave Pico, in this case i have chosen 0xAA (170 decimal) because it is easy to read on an Oscilloscope. Immediately after sending the single byte value, it sends to same 256 byte buffer as used in the original master-slave example.

In response to this, the Slave Pico, as it is receiving the single byte value of 0xAA, sends back the same value 0xAA to the Master Pico as a simultaneous transfer as per the way SPI works. Immediately after this transfer, as in the original example, the Master Pico sends the 256 byte buffer while the Slave Pico simultaneously sends back its own 256 byte buffer in return (a reversed copy of the buffer used by the master).

The issue:

Adding this separate single byte transfer before the 256 byte buffer is sent has caused it to break after it has been running for a short time.

It works fine for the first 100 to 200 transfers and the expected response data is seen by the Master Pico, but then at some random number of transfers later, usually after around one hundred or so, the Master Pico starts to report incorrect data being sent back from the Slave Pico. instead of reporting 0xAA being sent back it start to report 0x00 or 0x01. The start of the 256 byte buffer response is also corrupted and out of sync as the first byte shows as 0x00, followed by 0xAA and then the expected buffer starting from 0xFF and ending at 0x02 instead of 0x00 due to the apparent shifting of the response.

This error condition continues until the Pico's are reset.

Looking at the Oscilloscope i can confirm that the Salve Pico stops sending back the correct data when the fault condition appears after the first 100 to 200 transfers.

When the fault appears, it can be seen on the scope that the CS line has stopped going high in-between the first single byte transfer and second 256 byte buffer transfer. This could reason why the Slave Pico gets confused about when the 256 byte transfer starts resulting in the data sent back to the Master Pico being out of sync. However this does not explain why the first single byte transfer is also not being handled correctly by the Slave Pico, as by this time, the CS line has gone high after the end of the previous 256 byte transfer and 100 mS delay. So despite still starting with the CS line high, the first single byte transfer now starts to fail and continues to fail until the Pico is restarted.

This is what a correct transfer looks like, the blue trace is the SPI Clock, green trace is MOSI (master TX pin) and the purple trace is the response from the slave on MISO (master RX pin). The master sends 0xAA followed by the 256byte buffer starting at 0x00, 0x01, 0x02, etc. The Slave Pico sends 0xAA in response followed by 0xFF, 0xFE, 0xFD etc. SPI-transfer-correct1

Zoomed in: SPI-transfer-correct2

This is what a failed transfer looks like, the Master Pico is still sending the correct bytes but the 0xAA value is no longer being sent back by the Slave Pico and the next bytes from the Salve Pico are also incorrect (the 256byte buffer starting at 0xFF, 0xFE, 0xFD etc.) SPI-transfer-error1

Zoomed In: SPI-transfer-error2

chrisckc commented 1 year ago

Note that the original master-slave example modified to use the alternative SPI instances and pins works fine, that uses Mode0.

Using my test harness based on the master-slave example, if I comment out the extra single byte transfer in the master and slave source files, so it is doing the same as the master-slave example, it works fine using Mode1. If I instead comment out the 256 byte buffer transfer it also works fine just sending and responding with the 0xAA value.

The serial output for a successful transfer, same as the master-slave example but showing the additional single byte transfer response from the Slave Pico before the buffer response (returnValue value: 0xAA (170)):

SPI master example using SPI Mode: 1 SPI Clock: 1000000 Hz

SPI master says: The value 0xAA (170) followed immediately by the buffer printed below will be written to MOSI endlessly every 100 mS:
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF

The value 0xAA (170) is expected to be returned followed by a reversed version of the above buffer

SPI master says: read page 62 from the MISO (RX Pin) line, returnValue value: 0xAA (170)
FF FE FD FC FB FA F9 F8 F7 F6 F5 F4 F3 F2 F1 F0
EF EE ED EC EB EA E9 E8 E7 E6 E5 E4 E3 E2 E1 E0
DF DE DD DC DB DA D9 D8 D7 D6 D5 D4 D3 D2 D1 D0
CF CE CD CC CB CA C9 C8 C7 C6 C5 C4 C3 C2 C1 C0
BF BE BD BC BB BA B9 B8 B7 B6 B5 B4 B3 B2 B1 B0
AF AE AD AC AB AA A9 A8 A7 A6 A5 A4 A3 A2 A1 A0
9F 9E 9D 9C 9B 9A 99 98 97 96 95 94 93 92 91 90
8F 8E 8D 8C 8B 8A 89 88 87 86 85 84 83 82 81 80
7F 7E 7D 7C 7B 7A 79 78 77 76 75 74 73 72 71 70
6F 6E 6D 6C 6B 6A 69 68 67 66 65 64 63 62 61 60
5F 5E 5D 5C 5B 5A 59 58 57 56 55 54 53 52 51 50
4F 4E 4D 4C 4B 4A 49 48 47 46 45 44 43 42 41 40
3F 3E 3D 3C 3B 3A 39 38 37 36 35 34 33 32 31 30
2F 2E 2D 2C 2B 2A 29 28 27 26 25 24 23 22 21 20
1F 1E 1D 1C 1B 1A 19 18 17 16 15 14 13 12 11 10
0F 0E 0D 0C 0B 0A 09 08 07 06 05 04 03 02 01 00

After a short time the fault appears and every transfer looks like this for some time, the 0xAA return value is now 0x00 and the buffer responses from the Slave Pico appear to be shifted:

SPI master example using SPI Mode: 1 SPI Clock: 1000000 Hz

SPI master says: The value 0xAA (170) followed immediately by the buffer printed below will be written to MOSI endlessly every 100 mS:
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF

The value 0xAA (170) is expected to be returned followed by a reversed version of the above buffer

SPI master says: read page 141 from the MISO (RX Pin) line, returnValue value: 0x00 (000)
AA FF FE FD FC FB FA F9 F8 F7 F6 F5 F4 F3 F2 F1
F0 EF EE ED EC EB EA E9 E8 E7 E6 E5 E4 E3 E2 E1
E0 DF DE DD DC DB DA D9 D8 D7 D6 D5 D4 D3 D2 D1
D0 CF CE CD CC CB CA C9 C8 C7 C6 C5 C4 C3 C2 C1
C0 BF BE BD BC BB BA B9 B8 B7 B6 B5 B4 B3 B2 B1
B0 AF AE AD AC AB AA A9 A8 A7 A6 A5 A4 A3 A2 A1
A0 9F 9E 9D 9C 9B 9A 99 98 97 96 95 94 93 92 91
90 8F 8E 8D 8C 8B 8A 89 88 87 86 85 84 83 82 81
80 7F 7E 7D 7C 7B 7A 79 78 77 76 75 74 73 72 71
70 6F 6E 6D 6C 6B 6A 69 68 67 66 65 64 63 62 61
60 5F 5E 5D 5C 5B 5A 59 58 57 56 55 54 53 52 51
50 4F 4E 4D 4C 4B 4A 49 48 47 46 45 44 43 42 41
40 3F 3E 3D 3C 3B 3A 39 38 37 36 35 34 33 32 31
30 2F 2E 2D 2C 2B 2A 29 28 27 26 25 24 23 22 21
20 1F 1E 1D 1C 1B 1A 19 18 17 16 15 14 13 12 11
10 0F 0E 0D 0C 0B 0A 09 08 07 06 05 04 03 02 01

After some more time has passed the output changes to this, the return value is ow 0x01 and the buffer response values have shifted further:

SPI master example using SPI Mode: 1 SPI Clock: 1000000 Hz

SPI master says: The value 0xAA (170) followed immediately by the buffer printed below will be written to MOSI endlessly every 100 mS:
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF

The value 0xAA (170) is expected to be returned followed by a reversed version of the above buffer

SPI master says: read page 3232 from the MISO (RX Pin) line, returnValue value: 0x01 (001)
00 AA FF FE FD FC FB FA F9 F8 F7 F6 F5 F4 F3 F2
F1 F0 EF EE ED EC EB EA E9 E8 E7 E6 E5 E4 E3 E2
E1 E0 DF DE DD DC DB DA D9 D8 D7 D6 D5 D4 D3 D2
D1 D0 CF CE CD CC CB CA C9 C8 C7 C6 C5 C4 C3 C2
C1 C0 BF BE BD BC BB BA B9 B8 B7 B6 B5 B4 B3 B2
B1 B0 AF AE AD AC AB AA A9 A8 A7 A6 A5 A4 A3 A2
A1 A0 9F 9E 9D 9C 9B 9A 99 98 97 96 95 94 93 92
91 90 8F 8E 8D 8C 8B 8A 89 88 87 86 85 84 83 82
81 80 7F 7E 7D 7C 7B 7A 79 78 77 76 75 74 73 72
71 70 6F 6E 6D 6C 6B 6A 69 68 67 66 65 64 63 62
61 60 5F 5E 5D 5C 5B 5A 59 58 57 56 55 54 53 52
51 50 4F 4E 4D 4C 4B 4A 49 48 47 46 45 44 43 42
41 40 3F 3E 3D 3C 3B 3A 39 38 37 36 35 34 33 32
31 30 2F 2E 2D 2C 2B 2A 29 28 27 26 25 24 23 22
21 20 1F 1E 1D 1C 1B 1A 19 18 17 16 15 14 13 12
11 10 0F 0E 0D 0C 0B 0A 09 08 07 06 05 04 03 02

There appears to be a complete shifting of the responses from the Slave Pico which from observation only happens after a seemingly random number of transfers.

daveythacher commented 1 year ago

Long shot: Try it with #1158.

cgm999 commented 1 year ago

I was able to reproduce the issue, and here is the workarounds I am using..

1: disable/enable patch (@daveythacher based and others on RPI forums),without there is data corruption I forgot at which levels..

` static void inline setup_slave_spi0() {
spi_init(spi0, 1 * 1000000); //for slave bauds are not relevant
hw_clear_bits(&spi_get_hw(spi0)->cr1, SPI_SSPCR1_SSE_BITS); //disable the SPI
spi_set_format(spi0,16, 0, 1, SPI_MSB_FIRST);
spi_set_slave(spi0, true);
hw_set_bits(&spi_get_hw(spi0)->cr1, SPI_SSPCR1_SSE_BITS); //re-enable the SPI
}

void static inline setup_master_spi0(int mhz) {
spi_init(spi0, mhz 1000 1000);
hw_clear_bits(&spi_get_hw(spi0)->cr1, SPI_SSPCR1_SSE_BITS); //disable the SPI
spi_set_format(spi0,16, 1, 1, SPI_MSB_FIRST);
hw_set_bits(&spi_get_hw(spi0)->cr1, SPI_SSPCR1_SSE_BITS); //re-enable the SPI
}

` 2: I am manually switching CS/SS on/off , so not using in master gpio_set_function(PICO_DEFAULT_SPI_CSN_PIN, GPIO_FUNC_SPI) , instead set CS pin as gpio output and toggle around transfer. This is related to format CPHA=1 and PL022.

3: Notice that in spi_set_format I have slave 0,1 and master 1,1 ( viceversa works as well , what is not working 1,1 and 1,1 - data corruption after I think 4Mbit). I also use 16bit transfer,however 8bit works as well..

4: I am able to transfer data ok up to 20Mbit without overclock and bauds 31Mbit with 250Mhz overclock. After this rates slave is not able to send fast enough.. slave gets data ok from master (at 30Mbit without overclock)

5: In slave , I check if spi_is_busy or spi_is_readble after spi_write16_read16_blocking (or spi_write_read_blocking in case of 8bit ) and if either matches call again set_spi_slave_spi0. This is when the write_read call starts in the middle of master transfer.. (for example slave gets reset)

6: I added 2 byte transfer (or 1 byte for 8bit) just before the bulk one (like the OP have).. however in master I need to acctually write_read 2 times len=1 , second time with CS=1(off).. there is a bogus 1byte or 2byte with zeros send by slave every time This part needs better logic,

I did not used PIO yet, I wanted to see what can be achived with HW SPI PL022 from RP2040

Hope this helps..

chrisckc commented 1 year ago

Long shot: Try it with #1158.

Just tried those changes to the SDK, made no difference.

chrisckc commented 1 year ago

5: In slave , I check if spi_is_busy or spi_is_readble after spi_write16_read16_blocking (or spi_write_read_blocking in case of 8bit ) and if either matches call again set_spi_slave_spi0. This is when the write_read call starts in the middle of master transfer.. (for example slave gets reset)

Thanks for the info, I tested spi_is_busy and spi_is_readable on the slave Pico, I found that it causes the received data to be corrupted starting from the very first block of data.

From my testing, spi_is_busy returns true until the master transmits data to the slave, Im not sure what it is really meant for or how it determines the SPI bus is busy as the documentation doesn't say, or what "busy" even means in the context of SPI in slave mode. Regardless, It behaves in the opposite manner to how it is named.

When spi_is_busy returns false, spi_is_readable always returns true.

I suspect that by checking those functions first, the delay introduced in reaching the spi_write_read_blocking function results in data being lost by the slave. Normally the slave would just sit waiting at spi_write_read_blocking until data arrived.

In the case of a reset of the slave pico such that it reaches spi_write_read_blocking during a transfer from the master, it would never reach the expected number of bytes, the len parameter: int spi_write_read_blocking(spi_inst_t * spi, const uint8_t * src, uint8_t * dst, size_t len) so would just block until the next transfer from the master and then would likely also return some incomplete data from the previous transfer, although I haven't actually tested this behaviour so cant really say. If spi_is_busy is meant to prevent this, from my testing it seems it is not viable option.

With the original code, the slave Pico always receives the expected data from the master correctly, it is the response back to the master (which happens synchronously, full duplex style, during the data reception) that goes wrong after around one hundred or so transfer cycles.

chrisckc commented 1 year ago

Just updated the original repo with some data verification and the tests mentioned above. https://github.com/chrisckc/TestHarness-SPI-Pico-SDK

needs to be enabled first in spi_slave.c:

#define DEBUG_SERIAL_OUTPUT_SCROLLING (false) // If not scrolling the terminal position is reset using escape sequences, proper terminal emulator required
// Setting this to true breaks it, received data is corrupted
#define CHECK_SPI_STATUS false // Defines if we want to check the status of the SPI bus using spi_is_busy and spi_is_readable

change CHECK_SPI_STATUS to true and enable scrolling to see the error better.