Closed a-will closed 8 months ago
Nice. Thanks for the issue. Definitely worth it in Fast read mode.
I thought about this. Before moving forward this idea, would you mind let me know it is possible to control the # of dummy cycles on the host system?
I checked a few flash devices' datasheets. The granularity of the dummy cycle is not one cycle. Some devices only support byte granularity.
If host system allows 9 dummy cycles, for the Fast Read, 5 cycles for the Fast Read Dual, 3 cycles for the Fast Read Quad, SPI_DEVICE would support this feature by allowing longer dummy cycles and mux for the pipelined read data as suggested.
Currently, SPI_DEVICE only supports up to 8 dummy cycles.
I thought about this. Before moving forward this idea, would you mind let me know it is possible to control the # of dummy cycles on the host system?
I checked a few flash devices' datasheets. The granularity of the dummy cycle is not one cycle. Some devices only support byte granularity.
If host system allows 9 dummy cycles, for the Fast Read, 5 cycles for the Fast Read Dual, 3 cycles for the Fast Read Quad, SPI_DEVICE would support this feature by allowing longer dummy cycles and mux for the pipelined read data as suggested.
Currently, SPI_DEVICE only supports up to 8 dummy cycles.
From a compatibility perspective, we might want to prioritize quad SPI and make it 2 pipeline registers. As you mentioned, a lot of generic SPI controllers can only handle a granularity of 8 bits, so the dummy cycles would need to add up to a multiple of 8 bits somehow.
Assuming it is 2 pipeline registers, for dual SPI and fast read, the feature's utility would be contingent on either a more flexible SPI host controller or a SPI flash device that allows tuning dummy cycles to compensate (e.g. various Micron or Macronix parts).
Triaged for spi_device
. Assigning to M2.5 with https://github.com/lowRISC/opentitan/labels/Priority%3AP2 because we should confirm with the product team that current pass-through rates suffice. If they don't, we need to prioritize this. If they do, we can defer this to https://github.com/lowRISC/opentitan/labels/Type%3AFutureRelease.
Even if the RTL changes may be simple here, I suspect the DV and coverage work will be not trivial.
estimate 24 remaining 2023-03-23 24
Yes, and to update on the status: the downstream teams are discussing this and their feedback is pending
Please note the response in https://docs.google.com/spreadsheets/d/12gMTUi42Hu78eACrE0cShAsIPchuwOodhu1IkNfZJoE/edit#gid=998209441 QO-0010 Relevant to this issue; 1. Enable SDC constraints for passthrough (open source) @hcallahan-lowrisc / @andreaskurth - I'm assuming this means we should increase the priority of this issue for now?
Hold off from starting changes related to this issue until we have timing information on the current instantiation of the design, as well as DD and DV estimates.
Discussed during triaging meeting. The plan is to postpone any RTL optimizations post M2.5. Leaving the issue open to track the addition of SDC constraints so that we can complete the timing analysis.
Updating the estimate to 4 to track the SDC / synthesis iterations.
remaining 2023-04-13 4
@moidx - who will be working on the SDC constraints? Can you please assign the appropriate person? TIA
Assigning to @msfschaffner. I believe this should not be blocking for M2.5.1 as there won't be time to make any RTL changes.
@msfschaffner feel free to lower priority of switch to backlog. Thanks.
@a-will and I will be doing that in the coming week or two. Let's track for M2.5.1 for now, but keep it as a P2 (we can defer later down the road if needed).
Per effort estimation exercise - the RTL part of this is deferred. Capturing that addition of SDC constraints was meant to be tracked here.
Targeting for M2.5.1-RC1. This is trending 2023-05-18.
@msfschaffner - probably need to check if this is on schedule?
FYI: #18589 addresses the SDC component. However, the SDC part probably ought to have gone into a separate tracking issue, as the original idea might still be useful for a future release. ;)
We can move this issue to the M2.5 Backlog, since the SDC updates are tracked as part of #18274 now.
Add an optional pipeline stage or two to the return path for passthrough reads, and allow read commands to choose whether to use that path or the combinatorial one. Default to the combinatorial path for the first cycle and prepare for the mux selection change. In other words, for the data path it's just...
..with
is_pipelined_read
coming fromcmd_info
tables.For wider read opcodes (dual read, quad read), dummy cycles are part of the protocol, and during those dummy cycles, the host ignores the signal from the device. The SFDP advertises the number for each command, and firmware can take the value advertised by the SPI flash for dual read and quad read, add the pipeline latency to it, and advertise the new value in its SFDP. Then, the host behaves as though that were the true read latency all along and sends the needed additional SCK cycles.
This effectively cuts in half the round-trip path. The forward delay from the host to OT's passthrough I/Os is absorbed by the pipeline stage (since it is sampling read data), and timing for the command is reduced to an analysis between OT's passthrough/spi_host interface and the SPI flash (with the original host still contributing some potential skew, of course). The second hop of the return path is just the output of a pipeline stage back to the host, so that interface looks like the host is communicating directly with OT.
The expectation is that this will greatly increase the maximum read clock rate for dual read and quad read, while minimally slowing down normal read and possibly fast read (due to one extra logic level from the mux). The thinking here is that devices needing higher throughput will use faster interfaces, and focusing on speeding those commands up will yield the greatest benefit.
This may also be possible for fast read (1-1-1), but 8 dummy cycles may be the expectation there. Normal read must continue to use the combinatorial path. Also, the SPI flash device sees extra SCK cycles, but we simply drop the data the host didn't ask for.
CC @tjaychen @eunchan (Just logging the idea here, since we've randomly had discussions and can forget)