lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.6k stars 783 forks source link

[spi_device] EN4B/EX4B may fail if the next item comes very shortly #14940

Closed weicaiyang closed 2 years ago

weicaiyang commented 2 years ago

In this waveform, EX4B(E9) is received and cfg_addr_4b_en drops for a short period then asserts again without receiving a new command.

Screen Shot 2022-09-14 at 10 50 47 PM

Looks like it's because the next command comes too quickly, spi_csb_asserted_pulse_i is asserted again but spi_reg_cfg_addr_4b_en_sync hasn't been updated.

  always_ff @(posedge spi_clk_i or negedge sys_rst_ni) begin
    if (!sys_rst_ni) begin
      spi_cfg_addr_4b_en_o <= 1'b 0;
    end else if (spi_addr_4b_set_i) begin
      // This event occurs when EN4B command is received
      spi_cfg_addr_4b_en_o <= 1'b 1;
    end else if (spi_addr_4b_clr_i) begin
      // EX4B command raises the clear event
      spi_cfg_addr_4b_en_o <= 1'b 0;
    end else if (spi_csb_asserted_pulse_i
      && (spi_cfg_addr_4b_en_o != spi_reg_cfg_addr_4b_en_sync)) begin
      // Update
      spi_cfg_addr_4b_en_o <= spi_reg_cfg_addr_4b_en_sync;
    end
  end

Maybe we need to define the timing between 2 transactions to avoid this or try to fix it in the design.

Waves:

cd /mnt/disks/filestores/opentitan-shared/users/weicai/scratch/fix_km/spi_device-sim-vcs/failed/0.spi_device_flash_all.2868150057 verdi -ssr Verdi.ses &

cc: @a-will

a-will commented 2 years ago

Maybe we need to define the timing between 2 transactions to avoid this or try to fix it in the design.

Yup, you've got it. The minimum CSB inactive time is a necessary specification in the data sheet. Ours will need to cover the maximum propagation delay of changes like these.

tjaychen commented 2 years ago

it looks like right now the 4b address enable stuff is maintained in both the spi clock domain and the sys clock domain. so any updates from the spi side has to propagate to the sys clock side and come back in time in order for the value to be consistent (am i understanding this right?)

we could also consider just maintaining the control bit on spi clock domain only. On the sys clock side we just register a software intent to change and connect a read back path. That would remove most of the inactive time requirement in this case, otherwise if i'm understanding right, our inactive time is 2+ cycles of sys_clock, which could be close to 100ns.

This would only make sense though if we're not bound by the 2+ cycle somewhere else in the design.. (which I'm pretty certain we are, since we do a lot of CSb edge detection).

a-will commented 2 years ago

@tjaychen Whoops, I got some strategies mixed up. I agree that we ought to make these work like the busy bit. That means the important timing specs will be SCK-inactive-to-CSB-inactive and/or CSB-active-to-SCK-active probably. The former is for us to latch on CSB de-assertion.

All three will be things we need to get right, though :)

a-will commented 2 years ago

Yeah, the address mode would be another to have the source-of-truth in the spi_clk domain (as with any other state that isn't protected by the busy bit). Just like the busy bit, software would have to send a change into an async FIFO, and spi_clk can pick it up during the command byte.

Generally-speaking, hardware would handle the addressing mode autonomously. Software would only get involved in rare situations, such as when the upstream spi_host is in reset or when software wanted to emulate a "soft reset" command.

eunchan commented 2 years ago

Revising the address mode updating scheme same to the BUSY is okay to me. But, i am sure we shall spec out our minimum CSb de-assertion time in the datasheet. So the importance of the resolution is low in my opinion.

I am leaning toward to tag this as "Future Release". WDYT?

a-will commented 2 years ago

Revising the address mode updating scheme same to the BUSY is okay to me. But, i am sure we shall spec out our minimum CSb de-assertion time in the datasheet. So the importance of the resolution is low in my opinion.

I am leaning toward to tag this as "Future Release". WDYT?

Given that this device might be hooked up to some rather inflexible controllers for initial firmware load, it might behoove us to avoid having an overly large value for the spec, though. A random sampling of parts shows a minimum de-assertion time of 30-50 ns for non-read commands.

At least, if we want to push this off, we should probably check with our lead users and see if the value would prevent them from testing their use case.

tjaychen commented 2 years ago

could we find out what the minimum de-assertion time for us is first? Because I think we'll need to try to match this up with host behaviors. I have vague memories that being close to 100ns "could" become problematic in certain situations.

On Mon, Sep 19, 2022 at 9:23 AM Alex Williams @.***> wrote:

Revising the address mode updating scheme same to the BUSY is okay to me. But, i am sure we shall spec out our minimum CSb de-assertion time in the datasheet. So the importance of the resolution is low in my opinion.

I am leaning toward to tag this as "Future Release". WDYT?

Given that this device might be hooked up to some rather inflexible controllers for initial firmware load, it might behoove us to avoid having an overly large value for the spec, though. A random sampling of parts shows a minimum de-assertion time of 30-50 ns for non-read commands.

At least, if we want to push this off, we should probably check with our lead users and see if the value would prevent them from testing their use case.

— Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/14940#issuecomment-1251247306, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH2RST5JSQANPF7ZO6GX3TV7CHOPANCNFSM6AAAAAAQNBIFYE . You are receiving this because you were mentioned.Message ID: @.***>

a-will commented 2 years ago

The table looks like this, but this construction does make me itch a bit:

Component Delay
prim_edge_detector.g_sync 3 cycles of sys_clk (2 + 1 for phase)
hw2reg signal generation 1 cycle of sys_clk
hw2reg commit 1 cycle of sys_clk
sys_clk to spi_clk (combinatorial)

That's 5 cycles of sys_clk + the path delay from the sys_clk register to the first spi_clk register.

Then, we then need 2 spi_clk cycles to bring the value back into the spi_clk domain, plus another to latch it in at the sck_csb_asserted_pulse time. This is actually kind of a scary construction, as sck_csb_asserted_pulse is a pulse that just happens to be delayed possibly enough to come after the reg2hw addr4b_en synchronization is done. (Typically, spi_clk is not active when CSB is inactive, and CSB makes its change before any spi_clk cycles happen. sck_csb_asserted_pulse comes out of an edge detector with a synchronizer instantiated, though.)

It'd be good to at least have comments in there explaining that sck_csb_asserted_pulse needs to be after 2 spi_clk cycles to accommodate the flop2sync for reg2hw stuff coming in.

jeoongp commented 2 years ago

Sorry for late comment. I have some questions to clarify.

  1. Is busy-bit strategy a SW workaround by configuring CSR busy bit to enable the 4b address? In this case, do we need to spec out 50~100ns inactive time?
  2. Is a HW workaround to detect EX4B on SPI and autonomously handle it without busy bit?
a-will commented 2 years ago

Sorry for late comment. I have some questions to clarify.

  1. Is busy-bit strategy a SW workaround by configuring CSR busy bit to enable the 4b address? In this case, do we need to spec out 50~100ns inactive time?
  2. Is a HW workaround to detect EX4B on SPI and autonomously handle it without busy bit?

The strategy is to have the source of truth in the spi_clk domain. Changes due to flash commands update state on the spi_clk domain immediately, and the CSR read updates via a synchronizer. For software writes, registers in the sys_clk domain latch the intention to set/clear, and the spi_clk domain gets the update before the address phase begins.

Really I wouldn't call this a workaround: This is a command that we have set up for hardware to handle autonomously, but the code currently forces hardware changes to go to the software domain and back. That round-trip was probably an accident from leftovers or a misinterpretation of a feature request.

jeoongp commented 2 years ago

@a-will thanks and sorry. Let me clarify basic things first step-by-step.

  1. busy bit is written by the flash commands immediately on spi_clk or written by SW write on sys_clk and synchronized to spi_clk domain?
  2. Do we have a timing relation between flash commands/SW write and SPI data that indicates the address phase? In SPI_TPM, SW write has no relation with CSB timing. I would like to learn how this situation is different from SPI_TPM.
jeoongp commented 2 years ago

@a-will @tjaychen @eunchan @weicaiyang do you know where we have a spec that defines this timing relation between SW write and SPI address phase?

This timing condition can be compared with #15134.

a-will commented 2 years ago

@a-will thanks and sorry. Let me clarify basic things first step-by-step.

  1. busy bit is written by the flash commands immediately on spi_clk or written by SW write on sys_clk and synchronized to spi_clk domain?

It is written immediately on spi_clk. Write actions from SW are synchronized to spi_clk before writing to the flop in the spi_clk domain.

  1. Do we have a timing relation between flash commands/SW write and SPI data that indicates the address phase? In SPI_TPM, SW write has no relation with CSB timing. I would like to learn how this situation is different from SPI_TPM.

The SPI flash protocol always has an 8-bit command (except for eXecute-In-Place mode, which we don't support). Addresses follow the command, so we're guaranteed at least 8 clocks before the address mode must be known (though an implementation could conceivably not need it for another 24 clocks, since addresses are at least 3 bytes). That duration provides a time for synchronization across the domains, when coupled with a maximum spi_clk frequency.

weicaiyang commented 2 years ago

Do we have a timing relation between flash commands/SW write and SPI data that indicates the address phase? In SPI_TPM, SW write has no relation with CSB timing. I would like to learn how this situation is different from SPI_TPM.

In this issue, I actually wanted to know the relationship between 2 SPI commands. I always let SW update 3b/4b address while SPI is completed idle. This makes me think I need to consider HW and SW racing condition. There is a case that upstream SPI sends EN4B/EX4B to the upload fifo and set busy bit. Then SW updates 3b/4b before clear the busy bit. In the meanwhile, upstream SPI keeps polling the status via read state commands. This can verify that SW updating 3b/4b won't be affected by the read status command.

weicaiyang commented 2 years ago

can you let me know where we can find the timing relation between SW write for busy-bit and SPI address phase for SPI_FLASH?

IIUC, after SW clears the busy bit, upstream SPI needs to poll 1-2 read status to know the busy bit is cleared and then 8 cycles to sends opcode. After all these, it enters address phase. Not sure if there is any concern on this timing?

tjaychen commented 2 years ago

@jeoongp it might be good to have a look at one of the spi flash specs to get an idea of the general behavior. I can send you one that I usually reference.

@weicaiyang is it correct to use read status polling for EN4B/EX4B? I don't think the BUSY bit is required to be set when we send those commands.

jeoongp commented 2 years ago

@tjaychen @weicaiyang thanks for info. Once I receive the SPI flash spec, I will review it based on the info.

a-will commented 2 years ago

@weicaiyang is it correct to use read status polling for EN4B/EX4B? I don't think the BUSY bit is required to be set when we send those commands.

No, it's not. EN4B / EX4B must complete immediately (same as WREN / WRDI), and that is the motivation for moving their source-of-truth to the spi_clk domain.

jeoongp commented 2 years ago

Currently, SW busy bit in cmd_info looks like nothing to do with address commands in HW. @a-will do you suggest HW-based autonomous handling without SW busy-bit as long as the timing (8 spi cycles for cmd and 8 cycles for address) is OK? Or, do you suggest HW + SW busy-bit approach?

Sorry for repeated questions.. Please correct me if my understanding is incorrect.

a-will commented 2 years ago

Currently, SW busy bit in cmd_info looks like nothing to do with address commands in HW. @a-will do you suggest HW-based autonomous handling without SW busy-bit as long as the timing (8 spi cycles for cmd and 8 cycles for address) is OK? Or, do you suggest HW + SW busy-bit approach?

Sorry for repeated questions.. Please correct me if my understanding is incorrect.

I think we're talking about different things. In this thread, the busy bit handling is an example of a bit that has its source-of-truth in the spi_clk domain, where it is updated immediately in response to incoming flash commands, but it can also have updates coming from software. The design approach we took with it can be applied here.

It otherwise has no relationship to EN4B / EX4B.

weicaiyang commented 2 years ago

@weicaiyang is it correct to use read status polling for EN4B/EX4B? I don't think the BUSY bit is required to be set when we send those commands.

I was thinking we use command upload for EN4B/EX4B. Then, we need to set busy bit and let SW read the upload fifo and update 3b/4b. If it's not via upload command, EN4B/EX4B and WRDI/WREN won't set the busy bit as @a-will mentioned

tjaychen commented 2 years ago

hey all, if it helps, we can probably set a quick meeting to clear up any confusion.

a-will commented 2 years ago

@weicaiyang is it correct to use read status polling for EN4B/EX4B? I don't think the BUSY bit is required to be set when we send those commands.

I was thinking we use command upload for EN4B/EX4B. Then, we need to set busy bit and let SW read the upload fifo and update 3b/4b. If it's not via upload command, EN4B/EX4B and WRDI/WREN won't set the busy bit as @a-will mentioned

No, command upload should not be used for this. Again, EN4B / EX4B must complete before any subsequent command. There is no back pressure, no busy bit, nothing to hold off the next command so software has enough time to respond. This need is the reason EN4B, EX4B, WRDI, and WREN have their own, named command info slots.

There are chips in the wild that will immediately follow up with a read command, and spi_device must be ready to count out the correct number of address bits. :smile:

weicaiyang commented 2 years ago

@weicaiyang is it correct to use read status polling for EN4B/EX4B? I don't think the BUSY bit is required to be set when we send those commands.

I was thinking we use command upload for EN4B/EX4B. Then, we need to set busy bit and let SW read the upload fifo and update 3b/4b. If it's not via upload command, EN4B/EX4B and WRDI/WREN won't set the busy bit as @a-will mentioned

No, command upload should not be used for this. Again, EN4B / EX4B must complete before any subsequent command. There is no back pressure, no busy bit, nothing to hold off the next command so software has enough time to respond. This need is the reason EN4B, EX4B, WRDI, and WREN have their own, named command info slots.

I see. That makes sense. We shouldn't use upload command for EN4B, EX4B, WRDI, and WREN, otherwise, it violates the protocol.

Then, in term of SW updating 3b/4b address, it should be done at the very beginning before any SPI transaction starts, right? If so, we won't have race condition that SW updates 3b/4b while receiving a SPI transaction with address.

a-will commented 2 years ago

I see. That makes sense. We shouldn't use upload command for EN4B, EX4B, WRDI, and WREN, otherwise, it violates the protocol.

Then, in term of SW updating 3b/4b address, it should be done at the very beginning before any SPI transaction starts, right? If so, we won't have race condition that SW updates 3b/4b while receiving a SPI transaction with address.

Yup! As long as we can commit the SW updates before the address mode must be known, we will avoid what I'll call "synchronization issues". Practically-speaking, SW will only change the address mode while the upstream host is in reset, so we just need to get that value synchronized (and if the incoming command is EN4B / EX4B, drop the SW change and prioritize the hardware change).

weicaiyang commented 2 years ago

Thanks @a-will. The side discussion on the relationship of busy bit and EN4B/EX4B is clear to me now. @jeoongp please raise again if anything isn't clear to you.

Let's back to the original issue. (didn't find a good time slot that works for all of us today. let's see how the discussion goes. I may schedule a quick meeting tomorrow to solve this)

The table looks like this, but this construction does make me itch a bit:

Component Delay prim_edge_detector.g_sync 3 cycles of sys_clk (2 + 1 for phase) hw2reg signal generation 1 cycle of sys_clk hw2reg commit 1 cycle of sys_clk sys_clk to spi_clk (combinatorial) That's 5 cycles of sys_clk + the path delay from the sys_clk register to the first spi_clk register.

Then, we then need 2 spi_clk cycles to bring the value back into the spi_clk domain, plus another to latch it in at the sck_csb_asserted_pulse time. This is actually kind of a scary construction, as sck_csb_asserted_pulse is a pulse that just happens to be delayed possibly enough to come after the reg2hw addr4b_en synchronization is done. (Typically, spi_clk is not active when CSB is inactive, and CSB makes its change before any spi_clk cycles happen. sck_csb_asserted_pulse comes out of an edge detector with a synchronizer instantiated, though.)

It'd be good to at least have comments in there explaining that sck_csb_asserted_pulse needs to be after 2 spi_clk cycles to accommodate the flop2sync for reg2hw stuff coming in.

So, we need 5 sys_clk. sys_clk is 24mhz. it's 42x5=210ns. Is this delay too big? @tjaychen

tjaychen commented 2 years ago

yeah i think 200+ ns is too big. Somewhere in this thread I think @a-will mentioned other devices usually are in the low double digit range (30 - 40ns?) I have also seen hosts where they are hardwired to do 1~2 spi clocks between transactions. So this is most definitely going to be too much.

@a-will let me know if you disagree.

jeoongp commented 2 years ago

I am also clear now thanks to this clarification and help from @tjaychen ^__^

weicaiyang commented 2 years ago

Regardless of the command, @a-will found that CSB inactive time needs to be at least one sys_clk in this comment

So, no matter what approach we decide for this issue, we should document this timing requirement.

weicaiyang commented 2 years ago

fixed via https://github.com/lowRISC/opentitan/pull/15326