NACK for success isn't robust (and other I2C protocol subtleties)

Introduction

For the "update frame" command, this bootloader sort of inherits a design flaw from its AVR112 predecessor: it assumes that a target receiver should NACK the last byte of a transmission from the controller on success. We should probably fix this for the sake of robustness. I have a design in mind that I will describe later on.

As far as I can tell, the low-level I2C ACK protocol is suitable for flow control only. Additionally, there is no way to distinguish an unresponsive target from one that actively chooses to NACK a transmission. Therefore, the controller shouldn't rely on detecting a NACK on the last byte of a command as an indication of success.

Details

The bootloader calls abort_twi() on errors, which resets the hardware state machine to "unaddressed target" mode, which will probably release the bus lines until the next START. This is equivalent to issuing a NACK for every byte for the rest of the transaction. The low-level TWI library that we use can't distinguish an early NACK from a NACK at the end of a controller transmission, so even that theoretical distinction is unavailable to us for handshaking. (Note: I haven't extensively tested these scenarios yet, but I have seen bugs that are consistent with this explanation.)

Solution

A possible solution is to add a new, differently-numbered command for "update frame" that reverses the sense of ACK vs NACK for success. This should also maybe get a bootloader protocol version bump, though we could also continue implementing the old command while still sharing code. I'm willing to work to implement this.

Another possibility is to also add another new command to query the status of a flash page write.

Additional background

The I2C specification says that a controller should NACK the last byte when receiving from a target. One explanation (which isn't explicitly stated in the specification) is that it serves an important flow control function: the target needs to know to release the bus and stop transmitting at the falling edge of SCL at the end of the ACK bit period (the one where the controller has NACKed). Otherwise, in a protocol where the controller can read arbitrarily many bytes from the target, the target might continue transmitting, driving the SDA line during the next SCL period. This would interfere with the controller sending a STOP or repeated START condition.

Also, a NACK sent by a controller in receive mode is semantically different from a NACK from a target in receive mode: a target always knows when the controller is present, because the controller is driving clock pulses on the SCL line. The controller has no such assurances about a target NACK, except for maybe when the target happens to be stretching the clock.

keyboardio / attiny_i2c_bootloader