[i2c] Consider FIFO size and investigate replacing flip-flops with SRAM

igk8 commented 3 years ago

In I2C, the number of FIFO entries may be as high as 128 and there are four FIFOs per instance. The top-level IP, top_earlgrey, contains multiple I2C instances. As a result, flip-flop based FIFOs may take too much area. One way to solve this is to replace flip-flops with a dual-port SRAM. Does this approach have any bigger picture implications/consequences one needs to be aware of?

tjaychen commented 3 years ago

so the cross-over point is going to differ based on the technology.. when you say 128 entries, is that 128 8-bit entries? Ie, its 1024-bits per FIFO and there are 4 such FIFOs per I2C?

msfschaffner commented 3 years ago

From looking at the source code https://github.com/lowRISC/opentitan/blob/60ca587bd1266d910a5633ae87fc95c23f66cbc7/hw/ip/i2c/rtl/i2c_core.sv#L291-L372 it looks like we've got four instances of prim_fifo_sync with the following sizes:

1x 13bit x 32 deep
2x 8bit x 32 deep
1x 10bit x 32 deep

For our target technology, these FIFO configurations are all below the DP SRAM crossover point. So stdcell memory is still the most area efficient choice.

tjaychen commented 3 years ago

Michael do you know the rough roll over point is for the node we're targeting?

On Thu, Feb 4, 2021 at 7:11 PM Michael Schaffner notifications@github.com wrote:

From looking at the source code

https://github.com/lowRISC/opentitan/blob/60ca587bd1266d910a5633ae87fc95c23f66cbc7/hw/ip/i2c/rtl/i2c_core.sv#L291-L372 it looks like we've got four instances of prim_fifo_sync with the following sizes:

1x 13bit x 32 deep 2x 8bit x 32 deep 1x 10bit x 32 deep

For our target technology, these FIFO configurations are all below the DP SRAM crossover point. So stdcell memory is still the most area efficient choice.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/5112#issuecomment-773754822, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH2RSTJNF2Z65IQVPBLWOLS5NOX5ANCNFSM4XD3MJKA .

igk8 commented 3 years ago

Presently, these FIFOs are 32 deep, but they likely need to be 128 deep, per earlier discussion with @mdhayter . This is 5k bits per I2C instance.

msfschaffner commented 3 years ago

Thanks @igk8.

@tjaychen, that would move us above the threshold and it would make sense to put down SRAM memories for this.

@igk8 one question: is there a chance these FIFO memories could be consolidated into fewer than 4 separate memories? It would increase the area efficiency of the SRAM macro if we could do that.

tjaychen commented 3 years ago

agreed with Michael. If you can, it would be best to consolidate these.

On Fri, Feb 5, 2021 at 9:46 AM Michael Schaffner notifications@github.com wrote:

Thanks @igk8 https://github.com/igk8.

@tjaychen https://github.com/tjaychen, that would move us above the threshold and it would make sense to put down SRAM memories for this.

@igk8 https://github.com/igk8 one question: is there a chance these FIFO memories could be consolidated into fewer than 4 separate memories? It would increase the area efficiency of the SRAM macro if we could do that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/5112#issuecomment-774184173, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH2RSSSFIU7PTF2Z2RR4ALS5QVITANCNFSM4XD3MJKA .

imphil commented 3 years ago

Wouldn't it make sense to put the selection of the reg vs SRAM into the FIFO primitive? We can then have a technology-specific implementation which has the right thresholds built-in (and probably other constraints, such as width). This way we don't need any changes within the I2C block itself.

eunchan commented 3 years ago

+1 This was original intention of having prim wrapper. :) freely choose generic impl vs tech specific based on given parameter.

On Feb 5, 2021, at 10:45 AM, Philipp Wagner notifications@github.com wrote:

Wouldn't it make sense to put the selection of the reg vs SRAM into the FIFO primitive? We can then have a technology-specific implementation which has the right thresholds built-in (and probably other constraints, such as width). This way we don't need any changes within the I2C block itself.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/5112#issuecomment-774218376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJDG3WWP5SNKSJ4MLSTYITS5Q4FJANCNFSM4XD3MJKA.

tjaychen commented 3 years ago

yeah that makes sense too. So i suppose we have three suggestions?

consolidate the storage (if possible), as much as possible
use a parameter on said storage to switch between register based or memory based.
make the parameter something that can be fed into i2c, that way the top level can control whether it should be FF backed or RAM backed.

If the storage is the prim_fifo, we can introduce a parameter within the prim module to be memory backed. Does this sound right ot you guys?

On Fri, Feb 5, 2021 at 10:45 AM Philipp Wagner notifications@github.com wrote:

Wouldn't it make sense to put the selection of the reg vs SRAM into the FIFO primitive? We can then have a technology-specific implementation which has the right thresholds built-in (and probably other constraints, such as width). This way we don't need any changes within the I2C block itself.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/5112#issuecomment-774218376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH2RSQQ7WDPELNJSASZQNLS5Q4FJANCNFSM4XD3MJKA .

msfschaffner commented 3 years ago

Moving the selection of the memory type into the FIFO primitive would be useful, yes. But I am not entirely sure we should make the FF vs SRAM decision completely transparent to the designer. I.e., the selection of SRAM vs FF can have performance implications, and it is typically good to have a precise accounting of SRAM macros in the system for backend purposes. So adding a parameter to switch this sounds good to me.

msfschaffner commented 3 years ago

To @tjaychen's points: +1 for each ;).

tjaychen commented 3 years ago

i've updated my comments slightly based on @msfschaffner 's point.

martin-lueker commented 3 years ago

This all sounds great. Is the the idea that we would apply this to both sync and async FIFO's?

msfschaffner commented 3 years ago

I think that could be useful, yes. The DP SRAMs in our target tech are true dual clock SRAMs so this would be possible.

mdhayter commented 3 years ago

I don't remember the comment on 128 deep and not sure why I was thinking that. (I didn't have luck searching.) There was probably a particular protocol we were looking at. With the clock stretching we should be ok with smaller.

I could only find a comment where I suggested smaller "BTW if we think the block is too big, we could consider making the fifos smaller (the diagram says they are 128 entry today and I suspect 64 or 32 would be ok for most uses).: in: https://github.com/lowRISC/opentitan/pull/2746#discussion_r452564619

I have had fifos as small as 16 entry on I2C in the past.

igk8 commented 3 years ago

After going through my notes, I found that the 128-entry request was made in the original version of "One-Pager: I2C modifications" and wasn't debated afterward. If 64 entries are sufficient, would this put the FIFOs below the DP SRAM crossover point?

Consolidating FIFOs (e.g., into two: one TX and one RX) would stand in the way of implementing an interface that can serve as both host and target at the same time. Presently, I2C IP cannot do host and target at the same time, but that is the goal. This is the reason why we have four FIFOs.

msfschaffner commented 3 years ago

The crossover is right between 64 and 128 for the instances at hand (slightly closer to 64). So yes it puts us right under the crossover point.

I would however say that if 32 are sufficient, we should probably try to reduce them to 32 (this is an easy way to save a ton of flops).

igk8 commented 3 years ago

The FIFO depth is being increased from 32 to 64 in PR #5964

msfschaffner commented 3 years ago

@andreypronin @cfrantz just to double check: do we have any specific requirements for minimum I2C FIFO depths?

Increasing these from 32 to 64 or 128 can have quite some logic impact since we have several I2C instances in the top-level.

jesultra commented 3 years ago

I am Jes, from @andreypronin's team. I have seen 64-byte transaction size hardcoded at various places in Coreboot and possibly Depthcharge code, so having FIFO depths exceeding 64 bytes is not likely to carry much benefit, unless work is undertaken to make the aforementioned AP code able to utilize it, and even then it would be minimal. Going shorter that 64 bytes would cause some transactions to have to be "split", such that the clock is stretched in the middle, as the FIFO runs full/empty. Our software logic can be made to work, but there will be some slowdown in communication, as we would have an extra TockOS interrupt and kernel loop round trip. I do not have any numbers that I can provide. If flip-flops are very scarce, I could look into it more, but my preference would be 64-byte FIFOs, as that is what we have with Dauntless, and we know how works.

msfschaffner commented 3 years ago

Thanks @jesultra for the clarification.

In that case let's stick with 64 deep FIFOs for I2C. Since the FIFOs cannot be consolidated further due to the structure of the interface, there is no benefit in moving to SRAM in this technology, since the crossover point is above 64.

That being said, making the FIFO storage type parameterizable, and exposing this through parameters to the top level would be nice to have. That would allow other toplevels to switch the storage type, if needed.

igk8 commented 3 years ago

FIFO depth was made parameterizable in PR #7081; the default is 64. It can be set to a smaller value, if desired. FIFO depth larger than 64 will not work, because of the fixed register field sizes.

Closing this issue.

lowRISC / opentitan

[i2c] Consider FIFO size and investigate replacing flip-flops with SRAM #5112