[spi_device] SPI pass-through does not support mode 3

weicaiyang commented 1 year ago

TLDR; SCK doesn't return to idle for filtered commands in mode 3

SPI mode 3: CPOL=1, CPHA=1, DataOut=SCLK-falling-edge, DataIn=SCLK-rising-edge

I found we didn't set phase/polarity correctly in the device side of the SPI agent. When I tried to fix it, this issue came up.

In this waveform, the first sck is driven by host. The 2nd sck is what downstream sees. The en_latch is set too late and downstream misses the first falling edge of sck. The en_latch is affected by the previous command, depending on whether is filtered. Screen Shot 2022-11-14 at 2 59 46 PM

It will be good to test the spi_device with a real flash model on the downstream port, which ensures our design compliant with the SPI flash protocol. Maybe we can do this in V3.

Command to open the waveform session:

cd /mnt/disks/filestores/opentitan-shared/users/weicai/scratch/new1/spi_device-sim-vcs/0.spi_device_flash_all/latest; verdi -ssr Verdi.ses &

weicaiyang commented 1 year ago

To produce this

change this line to sck_polarity_phase inside {'b11};)
run the spi_device_flash_all with seed = 4109906088. ~60% seeds fail.

eunchan commented 1 year ago

Thanks. This is a bug. And I think we cannot fix this easily without sacrificing the timing.

The clock gating cell blocks the clock when the clock is 0 to not create a short pulse. So the logic drops the clock (if filtered) when 8th low of SCK comes. then it remains as it is. The filter information is reset by CSb de-assertion.

However, as CPHA, CPOL are 1, the off SCK is 1, not 0. So, clock enable signal affects a the next SCK low period. That is the waveform shown above.

In order to solve this, we need to invert the SCK then apply filter (gating), then re-invert. I honestly am not sure how much this will affect to the timing.

weicaiyang commented 1 year ago

Thanks. This is a bug. And I think we cannot fix this easily without sacrificing the timing.

The clock gating cell blocks the clock when the clock is 0 to not create a short pulse. So the logic drops the clock (if filtered) when 8th low of SCK comes. then it remains as it is. The filter information is reset by CSb de-assertion.

However, as CPHA, CPOL are 1, the off SCK is 1, not 0. So, clock enable signal affects a the next SCK low period. That is the waveform shown above.

In order to solve this, we need to invert the SCK then apply filter (gating), then re-invert. I honestly am not sure how much this will affect to the timing.

perhaps other designers can also chime in. @jeoongp @tjaychen

eunchan commented 1 year ago

I reviewed and prepared a fix for the issue. Commit is here.

However, I think still passthrough has more issues inside, how to handle the CPOL := 1 case. With the fix, still I see UVM_ERRORs. Am debugging now.

By the way, the fix introduces 2x inverter + 1 AND logic delay onto SCK path. If CPOL is 0, only AND gate delay is added.

jeoongp commented 1 year ago

I agree with @eunchan. For timing, I think 2x inverters + 1 AND logic delay might be OK. But I am not clear of errors under debugging. I hope it would not affect downstream inside.

tjaychen commented 1 year ago

synthesis / timing will tell us! i think we have constraints for passthrough, which is usually our toughest path :)

a-will commented 1 year ago

Does the SCK passthrough path need any of that? If we're passing it through, it shouldn't need anything except the one filtering gate, I'd think. Why bother inverting twice in a row for this combinatorial path that shouldn't change, regardless of CPOL and CPHA settings? (In other words, the passed through clock is logically separate from the internal sampling clock produced from the sck_i DIO.)

tjaychen commented 1 year ago

@a-will (@eunchan let me know if i misunderstood), i was just drawing this out, i think the issue are basically with the cpha/cpol combinations that capture on the negative edge of the clock.

The spi device filtering relies on observing the 8th data bit and then making the clock inactive. But our clock gating cell right now only gates to 0, if we're sampling on negative edge, the gating i think actually creates the sampling condition.

I think the double inversion (probably based on mode?) mostly gets around that problem. The other solution would be to check if we have a CG cell that gates high (i think some libraries have it) and the other would be to do the gating with primitive cells ourselves, which I don't think would be a horrible choice either. We would then however still need to statically select which gating path we're picking, which might create similar kinds of delay in the path.

tjaychen commented 1 year ago

actually talking to @a-will offline, he had a really interesting idea i never thought about before. What if instead of explicitly gating, we manipulate the internal pull-down/up and output enable?

So basically, we could pass the clock directly through all the time, along with a "disable" indication to spi_host. The spi_host, upon seeing the disable, could just "disable" the output enable drive of the clock pin.

Then as long as we statically set the internal pull-up/down to match the mode when initializing, we could get rid of the clock gating cell completely.

This of course adds a little complexity to block level DV, but it seems workable too, and would not rely on specific libraries like in my decription.

@eunchan what do you think?

eunchan commented 1 year ago

that's really good I think.

Also, I happen to find that our dedicated IO has invert mode there. We can use that for CPOL := 1 case rather than inverting inside SPI_DEVICE. ( Added: I meant SPI_DEV_CLK input pad)

CC: @msfschaffner

weicaiyang commented 1 year ago

actually talking to @a-will offline, he had a really interesting idea i never thought about before. What if instead of explicitly gating, we manipulate the internal pull-down/up and output enable?

So basically, we could pass the clock directly through all the time, along with a "disable" indication to spi_host. The spi_host, upon seeing the disable, could just "disable" the output enable drive of the clock pin.

Then as long as we statically set the internal pull-up/down to match the mode when initializing, we could get rid of the clock gating cell completely.

This of course adds a little complexity to block level DV, but it seems workable too, and would not rely on specific libraries like in my decription.

we have an output to spi_host - passthrough_en. Will this serve as the "disable" indicator? So, instead of gating SIO and clock, we just turn this off when spi_device wants to filter the transaction?

tjaychen commented 1 year ago

i think we probably can't use that one, since that one is used more for the sw / passthrough collision. I thought that one was dropped when we hit a busy case, and we need the software to take over spi_host. We might need a separate indication.

eunchan commented 1 year ago

i think we probably can't use that one, since that one is used more for the sw / passthrough collision. I thought that one was dropped when we hit a busy case, and we need the software to take over spi_host. We might need a separate indication.

spi_device_pkg::passthrough_req_t has sck_en which is output enable port of the PAD. we can use this.

eunchan commented 1 year ago

@tjaychen Do you want me to move on with Alex's idea? That way, all SPI_DEVICE needs to do is to drop the output enable when clock supposes to be in reset position (0 if CPOL :=0, 1 if CPOL := 1). We may need a latch for output enable though.

tjaychen commented 1 year ago

@eunchan that sounds good to me. I remember thinking we could have problems with glitches, but is that why you're adding a latch for output enable?

tjaychen commented 1 year ago

@eunchan i think for this has already been merged right?

eunchan commented 1 year ago

Yep. @weicaiyang could you please re-run the test and the issues are gone?

weicaiyang commented 1 year ago

@eunchan As discussed offline, the current fix doesn't work. In case you want to rerun, change 0 to 3 in this line and run test - spi_device_pass_cmd_filtering

eunchan commented 1 year ago

Thanks for checking ! as discussed, it looks like filter does not maintain its value until spi_in clock posedge. I will check and try to think what is a good & clean solution.

eunchan commented 1 year ago

I think current architecture cannot support mode 3 fully. I will summarize the issue in the doc and share here later. Anyhow, it is quite hard to support sck clock gating in mode 3 due to the data bit is half clock later..

eunchan commented 1 year ago

In essence, as data is shifted by half SCK, gating and enable SCK to downstream flash device is not possible for op-complete SPI Flash command.

It is not possible to gating the SCK then release CSb and enable SCK. CG to high may solve the latter (enable SCK) but it cannot meet the gating time as ‘CG to high’ enable affects only when SCK is high. That time already passed 8th SCK posedge.

eunchan commented 1 year ago

I overlooked this scenario and only considered Mode 0 when designing the passthrough filtering scheme. It is my fault.

Option at this stage is "Drop mode 3 support in SPI_DEVICE". Re-designing passthrough to filter commands in mode 3 requires overhaul of the passthrough logic (CG cannot be used at all). So maybe next version can do that but it is too late at this moment.

weicaiyang commented 1 year ago

Thanks @eunchan for clearly explaining this. Are we OK to not support mode3? @a-will @tjaychen

tjaychen commented 1 year ago

@eunchan / @weicaiyang sorry i'm not sure if i'm understanding right. But in mode 3, since it is negative edge launch and positive edge capture, wouldn't we want to gate low? So looking at the last negative edge (where bit 8 is launched), wouldn't we want to look at the value and then gate the clock to 0?

let me know what i misunderstood.

a-will commented 1 year ago

Is this another thing that could be aided by pad attributes? Can we make use of the inverters at both ends for mode 3? That would essentially convert mode 3 to mode 0 internally, right?

eunchan commented 1 year ago

Is this another thing that could be aided by pad attributes? Can we make use of the inverters at both ends for mode 3? That would essentially convert mode 3 to mode 0 internally, right?

It isn't unfortunately. Let me create a doc to explain @tjaychen @a-will

tjaychen commented 1 year ago

thanks @eunchan

eunchan commented 1 year ago

Doc is here: https://docs.google.com/document/d/1noHZ11DhFznM4cfUb55kFSR-IQvatkZDVrTgB9lIjVI/edit?usp=sharing

cindychip commented 1 year ago

https://docs.google.com/document/d/1noHZ11DhFznM4cfUb55kFSR-IQvatkZDVrTgB9lIjVI/edit?usp=sharing

Can anyone grant me the access to this file?

andreaskurth commented 1 year ago

Triaged for spi_device. Assigning to M2.5 with https://github.com/lowRISC/opentitan/labels/Priority%3AP1 until we have confirmed with the product team that supporting mode 3 is not required. After that, we can defer this to https://github.com/lowRISC/opentitan/labels/Type%3AFutureRelease.

andreaskurth commented 1 year ago

@cindychip has confirmed that Mode 3 is not required for M2.5 and has already updated the doc accordingly in PR #17240.

Thanks!

a-will commented 10 months ago

What's special about mode 3: SCK doesn't return to 0 at the end of the transaction, so prim_clock_gating's latch doesn't open up to changes. Consequently, a filtered command would not allow SCK to return to the required idle value of mode 3 (high, not low!).

moidx commented 10 months ago

Updated this issue to P1 so that it shows up in issue triage. I am following with use case stakeholders to determine if we can drop support for SPI-mode-3.

a-will commented 10 months ago

We may not be able to support mode 3 safely. It seems there is an insufficient number of edges to process the information, return to idle, and satisfy timing requirements for the keep-out zones around CSB edges at the downstream SPI flash.

It is only after the last falling edge that we know whether a command is filtered.
The very next edge (a rising edge) must be filtered until after CSB is de-asserted.
There are no remaining events by which to time the start of the rising edge of SCK after CSB is de-asserted.

a-will commented 9 months ago

I think we should be able to tolerate a mode 3 host, though, so long as the SPI flash device also doesn't care about the SPI_CLK idle value.

So, for both modes, we cut off the 8th posedge, and the clock gate's en_i will be held low until the next CSB falling edge. Then, the clock gate's output will remain low until the first posedge of SPI_CLK. For mode 0, that initial output matches the ordinary idle value, but for mode 3, idle usually is high. Because the clock gate has glitch protection that prevents its output from changing while the clock input is high, mode 3's idle value does not pass through.

If the SPI flash device can handle this mixing of idle values, then the host can be mode 0 or mode 3. It's a transitive sort of tolerance.

lowRISC / opentitan

[spi_device] SPI pass-through does not support mode 3 #16339