raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.03k stars 4.95k forks source link

A pulse in ACK signal stops I2C communcation on Pi 5 with kernel 6.6.20 #6056

Open neuralassembly opened 6 months ago

neuralassembly commented 6 months ago

Describe the bug

When I used AQM0802-based LCD on Pi 5 with kernel 6.6.20, it often stops responding to the command, and then it shows nothing on LCD.To make the LCD work again, I have to turn off the LCD.

When I using Pi 4 with kernel 6.1.0/6.6.20 or Pi 5 with kernel 6.1.0, such problem does not take place.

This problem might relate to the issues #5784 and #5988 . And the pull request #6050 did not affect this issue.

To see what is happening, I observed the SCL and SDA signals with oscilloscope when sending the setting comands to LCD. The results are shown below.

i2c_graph_all

Figure (A) and (B) show successful results with Pi 5 (6.1.0) and Pi 4 (6.6.20). Figure (A) on Pi 5 (6.1.0) shows two noisy pulses at the edges of ACK, but they do not affect the I2C communication.

On the other hand, Figure (C) shows the failure result with Pi 5 (6.6.20). Around t=0.0007, a pluse during ACK signal is observed, and it stops I2C communication.

Figure (C) around t=0.0007 is enlarged below.

i2c_graph_enlarged

When this failure takes place, dmesg shows the follwing error.

[ 161.891681] i2c_designware 1f00074000.i2c: i2c_dw_handle_tx_abort: lost arbitration

After that, I2C communications show the following errors.

[ 189.544399] i2c_designware 1f00074000.i2c: controller timed out

In such a situation, "i2cdetect -y 1" show the following result.

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         08 09 0a 0b 0c 0d 0e 0f
10: 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
20: 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
30: -- -- -- -- -- -- -- -- 38 39 3a 3b 3c 3d 3e 3f
40: 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f
70: 70 71 72 73 74 75 76 77

To recover from this failure, I have to turn off and on the LCD.

This failure takes place randomly, but the failure probability seems to depend on the command. In the above figure, the command "0x6c" has a high probability to fail.

When I decrease the I2C frequency (such as i2c_baudrate=50000), the failure probability also decrease, but I could not make it zero.

Steps to reproduce the behaviour

Connecting AQM0802-based LCD on Pi 5, and sending the setting commands such as:

bus.write_i2c_block_data(0x3e, 0x00, [0x38, 0x39, 0x14, 0x70, 0x56, 0x6c])

Device (s)

Raspberry Pi 5

System

Raspberry Pi 5 Pi 5 with kernel 6.6.20 (Raspberry Pi OS 2024-03-15)

Logs

No response

Additional context

No response

pelwell commented 6 months ago

Those waveforms don't look like they were taken when using the changes in #6050 (the clock cycles look too even) - were they? The PR is now merged so it would be helpful if you could use the current rpi-update kernel, just so we know that old issues aren't contributing.

neuralassembly commented 6 months ago

I also tried "sudo rpi-update pulls/6050", and I obtained similar data with kernel 6.6.21.

pelwell commented 6 months ago

Yes, you did say that, but I think your screenshots were taken without it.

neuralassembly commented 6 months ago

OK. This is a graph for Pi 5 with kernel 6.6.21. i2c_graph_pi5_6_6_21

pelwell commented 6 months ago

The loss of arbitration error is triggered by any change in SDA while SCL is high during any transfer. While any transfer is in progress, SDA is meant to be stable while SCL is high; SDA going low while SCL is high is a START signal, while SDA going high while SCL is high is a STOP signal.

In the failure cases above you can see that SDA goes high while SCL is high, caused by the ACK ending early and allowing SDA to be pulled high. It's not clear why the device would end the ACK prematurely while SCL is still high. It does appear that SDA goes high marginally earlier (i.e. closer to the falling edge of SCL), but the metric for that change ("tVD;ACK") is specified as a maximum, not a minimum.

Following the appearance of #6057 I've ordered a VEML7700 light sensor, in the hope that it shows the proble; failing an i2cdetect is catastrophic.

pelwell commented 5 months ago

PR #6071 should prevent the bus lockup, but I suspect you will still end up with a number of lost arbitration messages in the kernel log.

Have you got a link to somewhere selling the display?

neuralassembly commented 5 months ago

Thank you for creating the pull request #6071 .

After applying it, I observed three phenomena on the I2C communication failure.

1. After I2C communication fails, i2cdetect works.

When the I2C communication fails during the initialization of LCD as shown the above graphs, dmesg shows the following messages.

[   24.451363] i2c_designware 1f00074000.i2c: i2c_dw_handle_tx_abort: lost arbitration
[   24.662731] i2c_designware 1f00074000.i2c: i2c_dw_handle_tx_abort: SDA stuck at low

In this case, the next trial of initialization is possible (LCD off/on is not required), and "i2cdetect -y 1" shows correct result as shown below.

    0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- 3e --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --

2. After I2C communication fails, i2cdetect also fails

When the I2C communication fails after the successful initialization (e.g., when writing some characters to LCDs), dmesg shows only one waring.

[  371.274435] i2c_designware 1f00074000.i2c: i2c_dw_handle_tx_abort: lost arbitration

After that, "i2cdetect -y 1" fails as shown below.

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         08 09 0a 0b 0c 0d 0e 0f
10: 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
20: 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
30: -- -- -- -- -- -- -- -- 38 39 3a 3b 3c 3d 3e 3f
40: 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f
70: 70 71 72 73 74 75 76 77

Then dmesg shows the following warings.

[  384.801086] i2c_designware 1f00074000.i2c: controller timed out
(24 times)

In this case, LCD off/on is required for the next trial.

3. After I2C communication fails, i2cdetect also fails (but sometimes it can recover)

After the case 2, by trying "i2cdetect -y 1" several times, its output sometimes recovers from the failing state as shown below. (But its probability is not large.)

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         08 09 0a 0b 0c 0d 0e 0f
10: 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
20: 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- 3e --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --

After that, dmesg shows the following warnings.

[ 1562.259230] i2c_designware 1f00074000.i2c: controller timed out
(... many times)
[ 1563.052858] i2c_designware 1f00074000.i2c: i2c_dw_handle_tx_abort: SDA stuck at low

In this case, the next trial of communication with LCD is possible. So, the message "SDA stuck at low" added by #6071 implies that the I2C communication is possible.

As for the LCD (using AQM0802 annd PCA9515), I think it is available only in Japan. If you inform me your address, I can send it to you. https://akizukidenshi.com/catalog/g/g111753

I also tried VEML7700, but it shows no troubles for normal use (equivalent to "i2cget -y 1 0x10 0 w"?). The author of #6057 wrote that " tried with multiple i2c devices", So, I am interedsted in the device he used.

pelwell commented 5 months ago

If you inform me your address, I can send it to you.

That's very kind of you - email me (phil@raspberrypi.com) and we can work something out.

pelwell commented 5 months ago

I can reproduce the problem using that particular LCD (thank you, @neuralassembly), and hope to understand what is going wrong over the next few days.

pelwell commented 5 months ago

You might like to take a look at #6091, which brings the waveforms closer to Pi 4 and makes I2C (so far) reliable on the LCD display up to 100kHz, and on an MCP23017 up to 1MHz.

neuralassembly commented 5 months ago

Great work! I built the kernel, and I found that it works fine.