oxidecomputer / hubris

A lightweight, memory-protected, message-passing kernel for deeply embedded systems.
Mozilla Public License 2.0
3.01k stars 173 forks source link

spd: page selection and mux emulation are unfaithful #323

Open wesolows opened 2 years ago

wesolows commented 2 years ago

The SPD proxy emulates the page-selected state maintained by real SPD devices. However, it does so globally; there is a single variable tracking the selected page for all devices on all banks. Given the muxed bank design necessitated by limited SPD addressing, the collection of virtual SPD slaves that can respond to a page-select command (sent to addresses 0b011_011xr, where x is the page to select) should be limited to the bank(s) currently accessible through the mux. That is, the selected page must be per-bank, just as voffs is per-slave. By making this global, sequences of commands like the following will result in accesses to the wrong page:

Select bank 0 Select page 0 Read... Select page 1 Select bank 1 Select page 0 Read... Select bank 0 Read... <--- Gets page 0, rightly expects page 1

I do not have any evidence that the AMD master currently uses such sequences, but it would have every right to expect this to behave properly, and its behaviour may change in future without warning.

A similar but related issue is that the emulated mux does not allow simultaneous selection of multiple downstream buses, although in a real LTC4306 (or PCA9545A), the downstream bus selection is a bitmask. Here we're a bit safer because AMD do not specify the use of a specific mux in this application, so it is conceivable that someone might choose one that allows only a single downstream bus to be selected at one time, and selecting more than one bank to do page selection would be a very fine optimisation that would make sense only after much more obvious optimisations, e.g., doing multibyte reads of the SPD data itself. That said, it would not be at all surprising if the authors of this/future firmware were sloppy or made assumptions that don't always hold, so it is probably also a good idea to allow this in the mux emulation. Certainly any master would be confused by setting register 3 to 0xc0 and having us set the selected segment to None and thus disconnecting all downstream buses.

Again, I have no current evidence that this is causing any problem with actual AMD firmware.

Edited: Serial Presence Detect, not Scalable Data Port.

cbiffle commented 2 years ago

The page select thing seems like it needs fixed.

The muxing-to-multiple-buses thing is electrically impossible on the H7, as far as I understand it -- we can mux SDA/SCL to multiple sites simultaneously but they aren't connected together at the pins, and would instead fight each other inside the chip where the signals are not open drain. (We learned this the hard way.) So, if AMD needs this, we may need a discrete mux chip.

wesolows commented 2 years ago

This isn't as bad as I may have made it sound.

From JEDEC 1791.12a (the EE1004-v/TSE2004av spec) table 2, there are two collections of commands that act on individual SPDs (read and write memory, read and write temperature sensors) and numerous that act on all devices on the selected bus(es), specifically:

Write Protection commands SWPn, CWP, and RPSn, and the EE Page Address commands SPAn
and RPA, do not use the Select Address or Logical Serial Address, therefore all devices on the
SMBus will act on these commands simultaneously. Since it is impossible to determine which
device is responding to RPSn or RPA commands, for example, these functions are primarily used
for external device programmers rather than in-system applications.

With this sensible guidance in mind, it makes little sense to support AMD firmware selecting multiple downstream buses when accessing memory or temperature sensors. We don't want to allow the AMD firmware (or anyone else) to set or clear write protection, nor have we implemented the high-voltage mechanism necessary to change this on physical DIMMs. So the only broadcast operations we would expect to see are the two set page operations (SPA0 and SPA1), read page (RPA), and perhaps the read protection status (RPS0 through RPS3) commands. None of these commands need ever be passed through to the physical downstream buses. As noted above, page addresses are already virtualised and their faithful emulation does not require passing them through. If we ever find that we need to support virtual write protection, it should be possible to do so statically: all SPDs are always write-protected. Responding properly to RPSn, then, requires no SPD access at all. I don't have any evidence AMD ever issues RPSn.

So I believe we can safely rule out AMD firmware ever needing to issue a command to our SPD emulator that would need to be passed along to more than one downstream bus simultaneously. The only thing that might ever want to do that would be setting the EE page address, which is already fully virtualised. Thus I don't believe a discrete mux is needed, and the question about virtual mux emulation fidelity is really limited to the collection of virtual SPDs to which a SPA0, SPA1, or RPA command applies.