Closed weicaiyang closed 2 years ago
Maybe we need to define the timing between 2 transactions to avoid this or try to fix it in the design.
Yup, you've got it. The minimum CSB inactive time is a necessary specification in the data sheet. Ours will need to cover the maximum propagation delay of changes like these.
it looks like right now the 4b address enable stuff is maintained in both the spi clock
domain and the sys clock
domain.
so any updates from the spi side has to propagate to the sys clock
side and come back in time in order for the value to be consistent (am i understanding this right?)
we could also consider just maintaining the control bit on spi clock
domain only. On the sys clock
side we just register a software intent to change and connect a read back path. That would remove most of the inactive time requirement in this case, otherwise if i'm understanding right, our inactive time is 2+ cycles of sys_clock, which could be close to 100ns.
This would only make sense though if we're not bound by the 2+ cycle somewhere else in the design.. (which I'm pretty certain we are, since we do a lot of CSb edge detection).
@tjaychen Whoops, I got some strategies mixed up. I agree that we ought to make these work like the busy bit. That means the important timing specs will be SCK-inactive-to-CSB-inactive and/or CSB-active-to-SCK-active probably. The former is for us to latch on CSB de-assertion.
All three will be things we need to get right, though :)
Yeah, the address mode would be another to have the source-of-truth in the spi_clk domain (as with any other state that isn't protected by the busy bit). Just like the busy bit, software would have to send a change into an async FIFO, and spi_clk can pick it up during the command byte.
Generally-speaking, hardware would handle the addressing mode autonomously. Software would only get involved in rare situations, such as when the upstream spi_host is in reset or when software wanted to emulate a "soft reset" command.
Revising the address mode updating scheme same to the BUSY is okay to me. But, i am sure we shall spec out our minimum CSb de-assertion time in the datasheet. So the importance of the resolution is low in my opinion.
I am leaning toward to tag this as "Future Release". WDYT?
Revising the address mode updating scheme same to the BUSY is okay to me. But, i am sure we shall spec out our minimum CSb de-assertion time in the datasheet. So the importance of the resolution is low in my opinion.
I am leaning toward to tag this as "Future Release". WDYT?
Given that this device might be hooked up to some rather inflexible controllers for initial firmware load, it might behoove us to avoid having an overly large value for the spec, though. A random sampling of parts shows a minimum de-assertion time of 30-50 ns for non-read commands.
At least, if we want to push this off, we should probably check with our lead users and see if the value would prevent them from testing their use case.
could we find out what the minimum de-assertion time for us is first? Because I think we'll need to try to match this up with host behaviors. I have vague memories that being close to 100ns "could" become problematic in certain situations.
On Mon, Sep 19, 2022 at 9:23 AM Alex Williams @.***> wrote:
Revising the address mode updating scheme same to the BUSY is okay to me. But, i am sure we shall spec out our minimum CSb de-assertion time in the datasheet. So the importance of the resolution is low in my opinion.
I am leaning toward to tag this as "Future Release". WDYT?
Given that this device might be hooked up to some rather inflexible controllers for initial firmware load, it might behoove us to avoid having an overly large value for the spec, though. A random sampling of parts shows a minimum de-assertion time of 30-50 ns for non-read commands.
At least, if we want to push this off, we should probably check with our lead users and see if the value would prevent them from testing their use case.
— Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/14940#issuecomment-1251247306, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH2RST5JSQANPF7ZO6GX3TV7CHOPANCNFSM6AAAAAAQNBIFYE . You are receiving this because you were mentioned.Message ID: @.***>
The table looks like this, but this construction does make me itch a bit:
Component | Delay |
---|---|
prim_edge_detector.g_sync | 3 cycles of sys_clk (2 + 1 for phase) |
hw2reg signal generation | 1 cycle of sys_clk |
hw2reg commit | 1 cycle of sys_clk |
sys_clk to spi_clk | (combinatorial) |
That's 5 cycles of sys_clk + the path delay from the sys_clk register to the first spi_clk register.
Then, we then need 2 spi_clk cycles to bring the value back into the spi_clk domain, plus another to latch it in at the sck_csb_asserted_pulse
time. This is actually kind of a scary construction, as sck_csb_asserted_pulse
is a pulse that just happens to be delayed possibly enough to come after the reg2hw addr4b_en synchronization is done. (Typically, spi_clk is not active when CSB is inactive, and CSB makes its change before any spi_clk cycles happen. sck_csb_asserted_pulse
comes out of an edge detector with a synchronizer instantiated, though.)
It'd be good to at least have comments in there explaining that sck_csb_asserted_pulse
needs to be after 2 spi_clk cycles to accommodate the flop2sync for reg2hw stuff coming in.
Sorry for late comment. I have some questions to clarify.
Sorry for late comment. I have some questions to clarify.
- Is busy-bit strategy a SW workaround by configuring CSR busy bit to enable the 4b address? In this case, do we need to spec out 50~100ns inactive time?
- Is a HW workaround to detect EX4B on SPI and autonomously handle it without busy bit?
The strategy is to have the source of truth in the spi_clk domain. Changes due to flash commands update state on the spi_clk domain immediately, and the CSR read updates via a synchronizer. For software writes, registers in the sys_clk domain latch the intention to set/clear, and the spi_clk domain gets the update before the address phase begins.
Really I wouldn't call this a workaround: This is a command that we have set up for hardware to handle autonomously, but the code currently forces hardware changes to go to the software domain and back. That round-trip was probably an accident from leftovers or a misinterpretation of a feature request.
@a-will thanks and sorry. Let me clarify basic things first step-by-step.
@a-will @tjaychen @eunchan @weicaiyang do you know where we have a spec that defines this timing relation between SW write and SPI address phase?
This timing condition can be compared with #15134.
@a-will thanks and sorry. Let me clarify basic things first step-by-step.
- busy bit is written by the flash commands immediately on spi_clk or written by SW write on sys_clk and synchronized to spi_clk domain?
It is written immediately on spi_clk. Write actions from SW are synchronized to spi_clk before writing to the flop in the spi_clk domain.
- Do we have a timing relation between flash commands/SW write and SPI data that indicates the address phase? In SPI_TPM, SW write has no relation with CSB timing. I would like to learn how this situation is different from SPI_TPM.
The SPI flash protocol always has an 8-bit command (except for eXecute-In-Place mode, which we don't support). Addresses follow the command, so we're guaranteed at least 8 clocks before the address mode must be known (though an implementation could conceivably not need it for another 24 clocks, since addresses are at least 3 bytes). That duration provides a time for synchronization across the domains, when coupled with a maximum spi_clk frequency.
Do we have a timing relation between flash commands/SW write and SPI data that indicates the address phase? In SPI_TPM, SW write has no relation with CSB timing. I would like to learn how this situation is different from SPI_TPM.
In this issue, I actually wanted to know the relationship between 2 SPI commands. I always let SW update 3b/4b address while SPI is completed idle. This makes me think I need to consider HW and SW racing condition. There is a case that upstream SPI sends EN4B/EX4B to the upload fifo and set busy bit. Then SW updates 3b/4b before clear the busy bit. In the meanwhile, upstream SPI keeps polling the status via read state commands. This can verify that SW updating 3b/4b won't be affected by the read status command.
can you let me know where we can find the timing relation between SW write for busy-bit and SPI address phase for SPI_FLASH?
IIUC, after SW clears the busy bit, upstream SPI needs to poll 1-2 read status to know the busy bit is cleared and then 8 cycles to sends opcode. After all these, it enters address phase. Not sure if there is any concern on this timing?
@jeoongp it might be good to have a look at one of the spi flash specs to get an idea of the general behavior. I can send you one that I usually reference.
@weicaiyang is it correct to use read status polling for EN4B/EX4B? I don't think the BUSY bit is required to be set when we send those commands.
@tjaychen @weicaiyang thanks for info. Once I receive the SPI flash spec, I will review it based on the info.
@weicaiyang is it correct to use read status polling for EN4B/EX4B? I don't think the BUSY bit is required to be set when we send those commands.
No, it's not. EN4B / EX4B must complete immediately (same as WREN / WRDI), and that is the motivation for moving their source-of-truth to the spi_clk domain.
Currently, SW busy bit in cmd_info looks like nothing to do with address commands in HW. @a-will do you suggest HW-based autonomous handling without SW busy-bit as long as the timing (8 spi cycles for cmd and 8 cycles for address) is OK? Or, do you suggest HW + SW busy-bit approach?
Sorry for repeated questions.. Please correct me if my understanding is incorrect.
Currently, SW busy bit in cmd_info looks like nothing to do with address commands in HW. @a-will do you suggest HW-based autonomous handling without SW busy-bit as long as the timing (8 spi cycles for cmd and 8 cycles for address) is OK? Or, do you suggest HW + SW busy-bit approach?
Sorry for repeated questions.. Please correct me if my understanding is incorrect.
I think we're talking about different things. In this thread, the busy bit handling is an example of a bit that has its source-of-truth in the spi_clk domain, where it is updated immediately in response to incoming flash commands, but it can also have updates coming from software. The design approach we took with it can be applied here.
It otherwise has no relationship to EN4B / EX4B.
@weicaiyang is it correct to use read status polling for EN4B/EX4B? I don't think the BUSY bit is required to be set when we send those commands.
I was thinking we use command upload for EN4B/EX4B. Then, we need to set busy bit and let SW read the upload fifo and update 3b/4b. If it's not via upload command, EN4B/EX4B and WRDI/WREN won't set the busy bit as @a-will mentioned
hey all, if it helps, we can probably set a quick meeting to clear up any confusion.
@weicaiyang is it correct to use read status polling for EN4B/EX4B? I don't think the BUSY bit is required to be set when we send those commands.
I was thinking we use command upload for EN4B/EX4B. Then, we need to set busy bit and let SW read the upload fifo and update 3b/4b. If it's not via upload command, EN4B/EX4B and WRDI/WREN won't set the busy bit as @a-will mentioned
No, command upload should not be used for this. Again, EN4B / EX4B must complete before any subsequent command. There is no back pressure, no busy bit, nothing to hold off the next command so software has enough time to respond. This need is the reason EN4B, EX4B, WRDI, and WREN have their own, named command info slots.
There are chips in the wild that will immediately follow up with a read command, and spi_device must be ready to count out the correct number of address bits. :smile:
@weicaiyang is it correct to use read status polling for EN4B/EX4B? I don't think the BUSY bit is required to be set when we send those commands.
I was thinking we use command upload for EN4B/EX4B. Then, we need to set busy bit and let SW read the upload fifo and update 3b/4b. If it's not via upload command, EN4B/EX4B and WRDI/WREN won't set the busy bit as @a-will mentioned
No, command upload should not be used for this. Again, EN4B / EX4B must complete before any subsequent command. There is no back pressure, no busy bit, nothing to hold off the next command so software has enough time to respond. This need is the reason EN4B, EX4B, WRDI, and WREN have their own, named command info slots.
I see. That makes sense. We shouldn't use upload command for EN4B, EX4B, WRDI, and WREN, otherwise, it violates the protocol.
Then, in term of SW updating 3b/4b address, it should be done at the very beginning before any SPI transaction starts, right? If so, we won't have race condition that SW updates 3b/4b while receiving a SPI transaction with address.
I see. That makes sense. We shouldn't use upload command for EN4B, EX4B, WRDI, and WREN, otherwise, it violates the protocol.
Then, in term of SW updating 3b/4b address, it should be done at the very beginning before any SPI transaction starts, right? If so, we won't have race condition that SW updates 3b/4b while receiving a SPI transaction with address.
Yup! As long as we can commit the SW updates before the address mode must be known, we will avoid what I'll call "synchronization issues". Practically-speaking, SW will only change the address mode while the upstream host is in reset, so we just need to get that value synchronized (and if the incoming command is EN4B / EX4B, drop the SW change and prioritize the hardware change).
Thanks @a-will. The side discussion on the relationship of busy bit and EN4B/EX4B is clear to me now. @jeoongp please raise again if anything isn't clear to you.
Let's back to the original issue. (didn't find a good time slot that works for all of us today. let's see how the discussion goes. I may schedule a quick meeting tomorrow to solve this)
The table looks like this, but this construction does make me itch a bit:
Component Delay prim_edge_detector.g_sync 3 cycles of sys_clk (2 + 1 for phase) hw2reg signal generation 1 cycle of sys_clk hw2reg commit 1 cycle of sys_clk sys_clk to spi_clk (combinatorial) That's 5 cycles of sys_clk + the path delay from the sys_clk register to the first spi_clk register.
Then, we then need 2 spi_clk cycles to bring the value back into the spi_clk domain, plus another to latch it in at the
sck_csb_asserted_pulse
time. This is actually kind of a scary construction, assck_csb_asserted_pulse
is a pulse that just happens to be delayed possibly enough to come after the reg2hw addr4b_en synchronization is done. (Typically, spi_clk is not active when CSB is inactive, and CSB makes its change before any spi_clk cycles happen.sck_csb_asserted_pulse
comes out of an edge detector with a synchronizer instantiated, though.)It'd be good to at least have comments in there explaining that
sck_csb_asserted_pulse
needs to be after 2 spi_clk cycles to accommodate the flop2sync for reg2hw stuff coming in.
So, we need 5 sys_clk. sys_clk is 24mhz. it's 42x5=210ns. Is this delay too big? @tjaychen
yeah i think 200+ ns is too big. Somewhere in this thread I think @a-will mentioned other devices usually are in the low double digit range (30 - 40ns?) I have also seen hosts where they are hardwired to do 1~2 spi clocks between transactions. So this is most definitely going to be too much.
@a-will let me know if you disagree.
I am also clear now thanks to this clarification and help from @tjaychen ^__^
Regardless of the command, @a-will found that CSB inactive time needs to be at least one sys_clk in this comment
So, no matter what approach we decide for this issue, we should document this timing requirement.
In this waveform, EX4B(E9) is received and cfg_addr_4b_en drops for a short period then asserts again without receiving a new command.
Looks like it's because the next command comes too quickly,
spi_csb_asserted_pulse_i
is asserted again butspi_reg_cfg_addr_4b_en_sync
hasn't been updated.Maybe we need to define the timing between 2 transactions to avoid this or try to fix it in the design.
Waves:
cc: @a-will