oxidecomputer / hubris

A lightweight, memory-protected, message-passing kernel for deeply embedded systems.
Mozilla Public License 2.0
2.95k stars 167 forks source link

I2C is generating 600-750ns glitches, and it should feel bad about this. #1824

Open cbiffle opened 2 months ago

cbiffle commented 2 months ago

This is another episode in the hunt for the Gimlet Disk Wumpus. See also #1821 #1822 #1823.

tl;dr: we're glitching the data lines on I2C again, and since this violates setup/hold, devices can respond to this in arbitrarily annoying ways.

Example trace:

2024-07-12-144459_3472x1547_scrot

This is just after we stop the bus reset process (which we started for no obvious reason, see #1823) and resume normal service. We are generating ~680ns negative glitches on both lines, which means that when we reconfigure the pins, we're taking them through low-state push-pull at least briefly on their way back to peripheral-controlled open-drain.

This event is not unique; I see a couple of glitches per minute on average in otherwise working Gimlet traces. (I think they are all related to bus resets.)

These glitches are lengthy enough to bypass the I2C standard glitch filter (50ns), but short enough that they violate the I2C spec's setup and hold requirements. (They're also illegal in the protocol state machine, but, so are a lot of things.) This could potentially trigger metastability or misbehavior of devices on the I2C bus.

We had behavior like this up until February 2023 which was resulting in bus lockups, discussed in #1126, so I would assume that this could cause bus lockups.

cbiffle commented 2 months ago

Interestingly, these glitches do not occur after the initial pin wiggle that the driver issues at machine startup. Only when it attempts to recover from a condition by generating additional wiggles.

They're the same code path, so, that's interesting.