lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.51k stars 745 forks source link

[Timing] FPGA Timing on top_earlgrey fabric #133

Closed tjaychen closed 5 years ago

tjaychen commented 5 years ago

@asb @GregAC @eunchan @imphil

Opening a new issue to investigate the latencies on top_earlgrey

tjaychen commented 5 years ago

I removed the register stages inside our fabric just to get a sense of what timing paths are bad.

The one that immediately popped up is a path from instr_rdata_id_o all the way back into the fabric. I'm trying to decipher it more right now. I'm not sure if this this is something addressed by https://github.com/lowRISC/ibex/pull/282 already. Still looking.

tjaychen commented 5 years ago

on the current top_earlgrey rtl, there is a path that roughly traces the following

instruction_rdata -> illegal instruction -> exc_pc_mux -> fetch_addr_n -> prefetch buffer -> address change on tlul

I'm guessing this path still exists in the new prefetch buffer, which probably means starting from fetch_addr_n to tlul we should not insert any stages for performance reasons.

It seems reasonable to break the connection between exc_pc_mux -> fetch_addr_n however, since I don't imagine exception is something we need to be super low latency.

Having said all that, I'm not really convinced this path will be a real problem in silicon. I'll attach a sample path shortly.

eunchan commented 5 years ago

Attach top_earlgrey_nexysvideo_timing_summary_routed.txt is the latest timing report after removing the pipeline in the xbar and also applying the patch in lowRISC/ibex#282

The critical path has -10ns negative slack. So, if we keep the design as it is, 33MHz is the main clock freq. The path is from instr_rdata to the LSU. Looks like there's bypass path inside prefetch buffer. If we can cut it off then the timing can be relaxed.

tjaychen commented 5 years ago

these look slightly different from my reports, although is is pretty similar overall.

I feel like we should just relax fpga timing and perhaps connect up the second clock to blocks like uart / timer. Looking at the depth here, my sense is that actual synthesis would do much better.

eunchan commented 5 years ago

I put the pipeline to the data port only but still sees timing violation around a few ns.

On Tue, Sep 10, 2019, 5:41 PM tjaychen notifications@github.com wrote:

these look slightly different from my reports, although is is pretty similar overall.

I feel like we should just relax fpga timing and perhaps connect up the second clock to blocks like uart / timer. Looking at the depth here, my sense is that actual synthesis would do much better.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAJDG3VFCG6Z5DA64BBHEL3QJA5BZA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6M4UDQ#issuecomment-530172430, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJDG3VS3NBJUGP7KTGVJ3DQJA5BZANCNFSM4IVMCVKA .

tjaychen commented 5 years ago

When you say data port, do you mean the core data port? Or the response path in the fabric..?

eunchan commented 5 years ago

CoreD port in the xbar.

On Tue, Sep 10, 2019, 7:23 PM tjaychen notifications@github.com wrote:

When you say data port, do you mean the core data port? Or the response path in the fabric..?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAJDG3SBM3XQ7A7W7GO5LY3QJBJDPA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6NBP6A#issuecomment-530192376, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJDG3VZGBVAT3H6VHJ4SP3QJBJDPANCNFSM4IVMCVKA .

tjaychen commented 5 years ago

are the remaining violations on the I path..? Or do you mean on the D path there are still a few ns negative slack?

On Tue, Sep 10, 2019 at 7:50 PM Eunchan Kim notifications@github.com wrote:

CoreD port in the xbar.

On Tue, Sep 10, 2019, 7:23 PM tjaychen notifications@github.com wrote:

When you say data port, do you mean the core data port? Or the response path in the fabric..?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAJDG3SBM3XQ7A7W7GO5LY3QJBJDPA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6NBP6A#issuecomment-530192376 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAJDG3VZGBVAT3H6VHJ4SP3QJBJDPANCNFSM4IVMCVKA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSXEFZOOFC2YZLGMYPDQJBMG7A5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6NCVMI#issuecomment-530197169, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH2RSTYK4UVU3KTCRQDPOTQJBMG7ANCNFSM4IVMCVKA .

vogelpi commented 5 years ago

Including also @GregAC to the discussion.

@tomroberts-lowrisc made two PR (lowRISC/ibex#282, lowRISC/ibex#276) in upstream Ibex recently to address timing issue in Ibex. Both have not yet been vendored in. These two should be considered for doing this analysis. However, there is currently a bug on the new I-side which should first be fixed before actually vendoring in (lowRISC/ibex#293).

What we also discussed in our office yesterday: Can we re-evaluate which connections on the crossbar are really needed? For example, does the core still need to execute from SRAM, isn't it only flash and ROM that we need?

Also, should we consider adding a separate un-pipelined crossbar just for the SRAM to bring down the latency on the core D-port?

GregAC commented 5 years ago

It seems reasonable to break the connection between exc_pc_mux -> fetch_addr_n however, since I don't imagine exception is something we need to be super low latency.

Having said all that, I'm not really convinced this path will be a real problem in silicon. I'll attach a sample path shortly.

Having a fairly long combinational path from the instruction data in to the instruction address out doesn't feel like a great thing to do. Even if it does work out for an opentitan system in general I feel we should avoid those kind of things in ibex.

(Edit: Realised that this isn't a top-level feedthrough, because the instruction_rdata in question is coming from the prefetch buffer not the top-level IO, point below mostly still stands though)

Your suggestion for how to break it seems reasonable. If there is some exception that we decide does require super low latency we can always unbreak it specifically for that in a more optimised way.

GregAC commented 5 years ago

Thinking about it I think there's a more general issue here based around branches. The comparison result for a conditional branch gets fed straight into the pc_set_i input on the fetch unit so will generate a similar path. The paper 'Slow and Steady Wins the Race? A Comparison of Ultra-Low-Power RISC-V Cores for Internet-of-Things Applications' (sorry can't find a simple PDF link) highlighted this path as the worst in zero-riscy. Could be worth investigating a fix (this may fall out of some other ideas I was looking at around avoid stalling cycles on branches).

Edit: Opened an ibex issue to discuss this further https://github.com/lowRISC/ibex/issues/305

vogelpi commented 5 years ago

I agree with you @GregAC . And I think it makes totally sense to investigate this for Ibex.

However, inside OpenTitan this path will eventually be broken up by the I-cache. I more and more think we should just add a separate un-pipelined crossbar for all the stuff we need low latency (SRAM, flash, ROM) and use the current pipelined crossbar for all the peripherals.

tjaychen commented 5 years ago

@GregAC @vogelpi @eunchan Let's assume for now that either the icache exists, or the flash is fast enough that it can be read in one cycle like SRAM.

I agree that latency from I port to ROM / SRAM / flash should really be minimized. Do we also agree that D to SRAM should also be minimized? My general feeling is yes, since I don't think taking extra latency trying to access the stack is a great thing. D to flash being low latency is of lower concern to me. Do you agree?

If we don't care about D access to SRAM, then I think Eunchan's experiment, where he just pipelines the D port comes very close to want. Otherwise I think we need to re-structure the fabric a bit.

Let me know what you guys think. Eunchan can you share the timing report from your latest experiment? Based on this thread and the other from Greg / Pirmin, the left over paths should in theory all be related to exception -> instr_addr_o.

eunchan commented 5 years ago

On Wed, Sep 11, 2019 at 04:56:39AM -0700, Pirmin Vogel wrote:

I agree with you @GregAC . And I think it makes totally sense to investigate this for Ibex.

However, inside OpenTitan this path will eventually be broken up by the I-cache. I more and more think we should just add 2 separate un-pipelined crossbars

  • one for the instruction memories (flash, ROM)
  • one for the SRAM and use the current pipelined crossbar for all the peripherals.

Eventually we will have. But in my opinion, it won't have much timing gain by splitting the crossbar into multiple ones. Based on the timing report, xbar consumes 5ns as of now. Most of the xbar time is consumed inside the FIFO. I think it is necessary to optimize the tlul_socket_1n, or m1 such as eliminating tlul_fifo_sync from the socket components if it doesn't need (And probably not need most of the time from Ibex to the memories)

eunchan commented 5 years ago

Let me know what you guys think. Eunchan can you share the timing report from your latest experiment? Based on this thread and the other from Greg / Pirmin, the left over paths should in theory all be related to exception -> instr_addr_o.

Let me re-run the FPGA synthesis again. It was overwritten... :(

tjaychen commented 5 years ago

@eunchan, yeah i agree. If we can keep memory access latencies to 1 cycle (and the memory always returns next cycle anyways), i don't really see the need for fifo's along the way. Maybe we can look into just simplifying like you said, if 'depth' is 0, just pass everything through. I recall doing similar things for a different design.

Regarding splitting the fabric.. if we assume for the moment that non-memory latencies cannot be kept to 1 cycle, i think we'll at least need to split the D port into peripheral and memory paths, so that only one of them gets pipeline stage inserted (i still thinking keeping D to sram single cycle is important). It also feels like since instruction fetches will never go to peripheral blocks, we can at least save a tiny bit of area by either having the fabric be sparse or separate.

vogelpi commented 5 years ago

I agree with @tjaychen , we need to have single-cycle latency also for the SRAM, especially D-to-SRAM. D-to-Flash/D-to-ROM does not need to be low-latency.

asb commented 5 years ago

I think the importance of D-to-Flash/D-to-ROM will depend somewhat on application benchmarking. If there are lots of static data accesses, obviously fast data access is useful. You can always copy to SRAM, but that clearly puts more pressure on that resource.

vogelpi commented 5 years ago

FYI: I created a PR #167 to vendor in the latest Ibex version which has the loops on the I- and D-side fixed.

tjaychen commented 5 years ago

thanks Pirmin! i'll give this a try

On Fri, Sep 13, 2019 at 6:54 AM Pirmin Vogel notifications@github.com wrote:

FYI: I created a PR #167 https://github.com/lowRISC/opentitan/pull/167 to vendor in the latest Ibex version which has the loops on the I- and D-side fixed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSWKMHFYXTACMBDSYNDQJOLS5A5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6VC3LQ#issuecomment-531246510, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH2RSWZJPERW5FWSXCXE3DQJOLS5ANCNFSM4IVMCVKA .

tjaychen commented 5 years ago

hi all, in #152 , we are adding an option to pass through the fifos completely. Even though this does improve fpga timing, it's still showing roughly -3(D)~-5ns (I) of WNS.

I think what we want is basically select these pipestages in for FPGA, and keep them out for simulation, so we get a good handle on the real latency. (I imagine that to do this, I would pass a parameter down through fusesoc for FPGA only).

If we want to do the above, I'll need to make some minor tweaks to the xbar template. Eunchan does this seem reasonable to you? Or do you prefer a different way?

eunchan commented 5 years ago

Tim, how much the crossbar affects the timing? I saw it has 5~7ns delay in the crossbar in previous report but am not sure how much it is now. Maybe shouldn't we better to revise Ibex to break the time?

tjaychen commented 5 years ago

The cross bar effect is much smaller now. Now most of the timing impact is from some operation inside ibex through the fabric. So for example the exception path to fetch we talked about before is here. The alu path to data memory is also there

We could simplify the timing path, but in general I feel fpga's timing is not going to hit asic (that's been my experience at least ) so we'll need to find a way to relax things for fpga and allow silicon to be different

What do you think? We could also generate different xbar for silicon and fpga, and pick on the fly the one we want.

On Tue, Sep 17, 2019, 16:15 Eunchan Kim notifications@github.com wrote:

Tim, how much the crossbar affects the timing? I saw it has 5~7ns delay in the crossbar in previous report but am not sure how much it is now. Maybe shouldn't we better to revise Ibex to break the time?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSUUTP3FCPOQ22PCXY3QKFQJNA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66FTVA#issuecomment-532437460, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH2RSX274ARS7F3QMNA333QKFQJNANCNFSM4IVMCVKA .

tjaychen commented 5 years ago

O I suppose the other option is to just drop fpga clock speeds.

That's personally what I prefer. Since I believe the fixed path can be kept at the same speed.

On Tue, Sep 17, 2019, 16:19 Timothy Chen timothytim@google.com wrote:

The cross bar effect is much smaller now. Now most of the timing impact is from some operation inside ibex through the fabric. So for example the exception path to fetch we talked about before is here. The alu path to data memory is also there

We could simplify the timing path, but in general I feel fpga's timing is not going to hit asic (that's been my experience at least ) so we'll need to find a way to relax things for fpga and allow silicon to be different

What do you think? We could also generate different xbar for silicon and fpga, and pick on the fly the one we want.

On Tue, Sep 17, 2019, 16:15 Eunchan Kim notifications@github.com wrote:

Tim, how much the crossbar affects the timing? I saw it has 5~7ns delay in the crossbar in previous report but am not sure how much it is now. Maybe shouldn't we better to revise Ibex to break the time?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSUUTP3FCPOQ22PCXY3QKFQJNA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66FTVA#issuecomment-532437460, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH2RSX274ARS7F3QMNA333QKFQJNANCNFSM4IVMCVKA .

eunchan commented 5 years ago

On Sep 17, 2019, at 4:20 PM, tjaychen notifications@github.com wrote:

The cross bar effect is much smaller now. Now most of the timing impact is from some operation inside ibex through the fabric. So for example the exception path to fetch we talked about before is here. The alu path to data memory is also there

We could simplify the timing path, but in general I feel fpga's timing is not going to hit asic (that's been my experience at least ) so we'll need to find a way to relax things for fpga and allow silicon to be different

What do you think? We could also generate different xbar for silicon and fpga, and pick on the fly the one we want. Yeah. I think we could have different version of the top (or at least the crossbar) for FPGA. Do you want to assume this task? Maybe we might change topgen to generate multiple flavored version of top (or crossbars).

On Tue, Sep 17, 2019, 16:15 Eunchan Kim notifications@github.com wrote:

Tim, how much the crossbar affects the timing? I saw it has 5~7ns delay in the crossbar in previous report but am not sure how much it is now. Maybe shouldn't we better to revise Ibex to break the time?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSUUTP3FCPOQ22PCXY3QKFQJNA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66FTVA#issuecomment-532437460, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH2RSX274ARS7F3QMNA333QKFQJNANCNFSM4IVMCVKA .

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAJDG3VTMI22WTG5WRYQHV3QKFQ2ZA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66F3EQ#issuecomment-532438418, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJDG3XYOS6NYBCVIIMVW53QKFQ2ZANCNFSM4IVMCVKA.

tjaychen commented 5 years ago

yeah i can look at it. I might have to ask you for help since you are the tlgen master :) My current thinking though, is not so much generating different xbars, but exposing the parameters to the top, so that it can be configured by a core file.

Let me see how far I get with that, and I'll send it to you as a proof of concept.

On Tue, Sep 17, 2019 at 4:22 PM Eunchan Kim notifications@github.com wrote:

On Sep 17, 2019, at 4:20 PM, tjaychen notifications@github.com wrote:

The cross bar effect is much smaller now. Now most of the timing impact is from some operation inside ibex through the fabric. So for example the exception path to fetch we talked about before is here. The alu path to data memory is also there

We could simplify the timing path, but in general I feel fpga's timing is not going to hit asic (that's been my experience at least ) so we'll need to find a way to relax things for fpga and allow silicon to be different

What do you think? We could also generate different xbar for silicon and fpga, and pick on the fly the one we want. Yeah. I think we could have different version of the top (or at least the crossbar) for FPGA. Do you want to assume this task? Maybe we might change topgen to generate multiple flavored version of top (or crossbars).

On Tue, Sep 17, 2019, 16:15 Eunchan Kim notifications@github.com wrote:

Tim, how much the crossbar affects the timing? I saw it has 5~7ns delay in the crossbar in previous report but am not sure how much it is now. Maybe shouldn't we better to revise Ibex to break the time?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSUUTP3FCPOQ22PCXY3QKFQJNA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66FTVA#issuecomment-532437460 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAH2RSX274ARS7F3QMNA333QKFQJNANCNFSM4IVMCVKA

.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAJDG3VTMI22WTG5WRYQHV3QKFQ2ZA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66F3EQ#issuecomment-532438418>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAJDG3XYOS6NYBCVIIMVW53QKFQ2ZANCNFSM4IVMCVKA .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSXMA47C2KMSY7CMPJ3QKFRETA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66F73A#issuecomment-532439020, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH2RSR43YKYWCNS5IYA3DLQKFRETANCNFSM4IVMCVKA .

eunchan commented 5 years ago

On Sep 17, 2019, at 4:27 PM, tjaychen notifications@github.com wrote:

yeah i can look at it. I might have to ask you for help since you are the tlgen master :) My current thinking though, is not so much generating different xbars, but exposing the parameters to the top, so that it can be configured by a core file. Looks good idea to me. Maybe we need to discuss how many parameters we should expose to the crossbar top.

Let me see how far I get with that, and I'll send it to you as a proof of concept.

On Tue, Sep 17, 2019 at 4:22 PM Eunchan Kim notifications@github.com wrote:

On Sep 17, 2019, at 4:20 PM, tjaychen notifications@github.com wrote:

The cross bar effect is much smaller now. Now most of the timing impact is from some operation inside ibex through the fabric. So for example the exception path to fetch we talked about before is here. The alu path to data memory is also there

We could simplify the timing path, but in general I feel fpga's timing is not going to hit asic (that's been my experience at least ) so we'll need to find a way to relax things for fpga and allow silicon to be different

What do you think? We could also generate different xbar for silicon and fpga, and pick on the fly the one we want. Yeah. I think we could have different version of the top (or at least the crossbar) for FPGA. Do you want to assume this task? Maybe we might change topgen to generate multiple flavored version of top (or crossbars).

On Tue, Sep 17, 2019, 16:15 Eunchan Kim notifications@github.com wrote:

Tim, how much the crossbar affects the timing? I saw it has 5~7ns delay in the crossbar in previous report but am not sure how much it is now. Maybe shouldn't we better to revise Ibex to break the time?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSUUTP3FCPOQ22PCXY3QKFQJNA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66FTVA#issuecomment-532437460 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAH2RSX274ARS7F3QMNA333QKFQJNANCNFSM4IVMCVKA

.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAJDG3VTMI22WTG5WRYQHV3QKFQ2ZA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66F3EQ#issuecomment-532438418>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAJDG3XYOS6NYBCVIIMVW53QKFQ2ZANCNFSM4IVMCVKA .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSXMA47C2KMSY7CMPJ3QKFRETA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66F73A#issuecomment-532439020, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH2RSR43YKYWCNS5IYA3DLQKFRETANCNFSM4IVMCVKA .

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAJDG3XNDNJDQ7YP3MI4SUDQKFRWHA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66GHTY#issuecomment-532440015, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJDG3UQXG5AMMOJFGWEIVLQKFRWHANCNFSM4IVMCVKA.

tjaychen commented 5 years ago

let me push #152 in, and i'll work on another PR, don't want this PR getting too big.

On Tue, Sep 17, 2019 at 4:29 PM Eunchan Kim notifications@github.com wrote:

On Sep 17, 2019, at 4:27 PM, tjaychen notifications@github.com wrote:

yeah i can look at it. I might have to ask you for help since you are the tlgen master :) My current thinking though, is not so much generating different xbars, but exposing the parameters to the top, so that it can be configured by a core file. Looks good idea to me. Maybe we need to discuss how many parameters we should expose to the crossbar top.

Let me see how far I get with that, and I'll send it to you as a proof of concept.

On Tue, Sep 17, 2019 at 4:22 PM Eunchan Kim notifications@github.com wrote:

On Sep 17, 2019, at 4:20 PM, tjaychen notifications@github.com wrote:

The cross bar effect is much smaller now. Now most of the timing impact is from some operation inside ibex through the fabric. So for example the exception path to fetch we talked about before is here. The alu path to data memory is also there

We could simplify the timing path, but in general I feel fpga's timing is not going to hit asic (that's been my experience at least ) so we'll need to find a way to relax things for fpga and allow silicon to be different

What do you think? We could also generate different xbar for silicon and fpga, and pick on the fly the one we want. Yeah. I think we could have different version of the top (or at least the crossbar) for FPGA. Do you want to assume this task? Maybe we might change topgen to generate multiple flavored version of top (or crossbars).

On Tue, Sep 17, 2019, 16:15 Eunchan Kim notifications@github.com wrote:

Tim, how much the crossbar affects the timing? I saw it has 5~7ns delay in the crossbar in previous report but am not sure how much it is now. Maybe shouldn't we better to revise Ibex to break the time?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSUUTP3FCPOQ22PCXY3QKFQJNA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66FTVA#issuecomment-532437460

,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/AAH2RSX274ARS7F3QMNA333QKFQJNANCNFSM4IVMCVKA

.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <

https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAJDG3VTMI22WTG5WRYQHV3QKFQ2ZA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66F3EQ#issuecomment-532438418 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AAJDG3XYOS6NYBCVIIMVW53QKFQ2ZANCNFSM4IVMCVKA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSXMA47C2KMSY7CMPJ3QKFRETA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66F73A#issuecomment-532439020 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAH2RSR43YKYWCNS5IYA3DLQKFRETANCNFSM4IVMCVKA

.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAJDG3XNDNJDQ7YP3MI4SUDQKFRWHA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66GHTY#issuecomment-532440015>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAJDG3UQXG5AMMOJFGWEIVLQKFRWHANCNFSM4IVMCVKA .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSU4DTDJ7XTRU2OMH4TQKFR7DA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66GLUA#issuecomment-532440528, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH2RSQ4MHUVBXXR2G7T4ATQKFR7DANCNFSM4IVMCVKA .

eunchan commented 5 years ago

SGTM

On Sep 17, 2019, at 4:31 PM, tjaychen notifications@github.com wrote:

let me push #152 in, and i'll work on another PR, don't want this PR getting too big.

On Tue, Sep 17, 2019 at 4:29 PM Eunchan Kim notifications@github.com wrote:

On Sep 17, 2019, at 4:27 PM, tjaychen notifications@github.com wrote:

yeah i can look at it. I might have to ask you for help since you are the tlgen master :) My current thinking though, is not so much generating different xbars, but exposing the parameters to the top, so that it can be configured by a core file. Looks good idea to me. Maybe we need to discuss how many parameters we should expose to the crossbar top.

Let me see how far I get with that, and I'll send it to you as a proof of concept.

On Tue, Sep 17, 2019 at 4:22 PM Eunchan Kim notifications@github.com wrote:

On Sep 17, 2019, at 4:20 PM, tjaychen notifications@github.com wrote:

The cross bar effect is much smaller now. Now most of the timing impact is from some operation inside ibex through the fabric. So for example the exception path to fetch we talked about before is here. The alu path to data memory is also there

We could simplify the timing path, but in general I feel fpga's timing is not going to hit asic (that's been my experience at least ) so we'll need to find a way to relax things for fpga and allow silicon to be different

What do you think? We could also generate different xbar for silicon and fpga, and pick on the fly the one we want. Yeah. I think we could have different version of the top (or at least the crossbar) for FPGA. Do you want to assume this task? Maybe we might change topgen to generate multiple flavored version of top (or crossbars).

On Tue, Sep 17, 2019, 16:15 Eunchan Kim notifications@github.com wrote:

Tim, how much the crossbar affects the timing? I saw it has 5~7ns delay in the crossbar in previous report but am not sure how much it is now. Maybe shouldn't we better to revise Ibex to break the time?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSUUTP3FCPOQ22PCXY3QKFQJNA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66FTVA#issuecomment-532437460

,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/AAH2RSX274ARS7F3QMNA333QKFQJNANCNFSM4IVMCVKA

.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <

https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAJDG3VTMI22WTG5WRYQHV3QKFQ2ZA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66F3EQ#issuecomment-532438418 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AAJDG3XYOS6NYBCVIIMVW53QKFQ2ZANCNFSM4IVMCVKA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSXMA47C2KMSY7CMPJ3QKFRETA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66F73A#issuecomment-532439020 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAH2RSR43YKYWCNS5IYA3DLQKFRETANCNFSM4IVMCVKA

.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAJDG3XNDNJDQ7YP3MI4SUDQKFRWHA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66GHTY#issuecomment-532440015>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAJDG3UQXG5AMMOJFGWEIVLQKFRWHANCNFSM4IVMCVKA .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSU4DTDJ7XTRU2OMH4TQKFR7DA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66GLUA#issuecomment-532440528, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH2RSQ4MHUVBXXR2G7T4ATQKFR7DANCNFSM4IVMCVKA .

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAJDG3T5IRBNWLWPH2P5R3TQKFSDVA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66GNWQ#issuecomment-532440794, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJDG3RCSE6EJYNEUDCLMGTQKFSDVANCNFSM4IVMCVKA.

vogelpi commented 5 years ago

I generated timing reports of Ibex both when instantiated inside OpenTitan or PULPissmo. The one from OpenTitan is a bit difficult to interprete as stuff inside the core is optimized a lot. For example, it is a bit weird that mie_q[irq_fast][14] shows up in the longest path when actually all fast interrupts are statically driven by 0.

Looking at the timing report from PULPissimo is a bit more insightful. The problematic path is hit by the illegal instruction detection for CSR instructions. This path can be broken by buffering the illegal_csr Signal. @GregAC will fix this upstream.

ibex_opentitan.txt ibex_pulpissimo.txt

tjaychen commented 5 years ago

thanks Pirmin and Greg!

On Wed, Sep 18, 2019 at 8:36 AM Pirmin Vogel notifications@github.com wrote:

I generated timing reports of Ibex both when instantiated inside OpenTitan or PULPissmo. The one from OpenTitan is a bit difficult to interprete as stuff inside the core is optimized a lot. For example, it is a bit weird that mie_q[irq_fast][14] shows up in the longest path when actually all fast interrupts are statically driven by 0.

Looking at the timing report from PULPissimo is a bit more insightful. The problematic path is hit by the illegal instruction detection for CSR instructions. This path can be broken by buffering the illegal_csr Signal. @GregAC https://github.com/GregAC will fix this upstream.

ibex_opentitan.txt https://github.com/lowRISC/opentitan/files/3627234/ibex_opentitan.txt ibex_pulpissimo.txt https://github.com/lowRISC/opentitan/files/3627235/ibex_pulpissimo.txt

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/133?email_source=notifications&email_token=AAH2RSV5KY4MGB5SYLX2JNTQKJDGPA5CNFSM4IVMCVKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7APZDY#issuecomment-532741263, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH2RSQKRT5KWFFC4H6R3P3QKJDGPANCNFSM4IVMCVKA .

tjaychen commented 5 years ago

i think all the items here are generally known. On the OT side we've removed FIFOs from places they are not needed, and are adding an option for FPGA to optionally place in pipe stages.

I will close this issue now and will open other ones as latency / timing problems pop up.

vogelpi commented 5 years ago

Thanks @tjaychen and @eunchan for your PRs disabling the FIFOs in simulation, and for reducing the FLASH latency. For the record, the PRs are here:

GregAC commented 5 years ago

I've been playing with coremark in a ibex only verilator sim (with single cycle i and d side memories). I'll try again running in earlgrey and see how it compares. Thanks for the PRs @tjaychen and @eunchan