Add pipelining support to LSU requester

kuznia-rdzeni / coreblocks

RISC-V out-of-order core for education and research purposes

https://kuznia-rdzeni.github.io/coreblocks/

BSD 3-Clause "New" or "Revised" License

33 stars 13 forks source link

Add pipelining support to LSU requester #695

Closed lekcyjna123 closed 1 month ago

lekcyjna123 commented 1 month ago

Here is a small refactor of the LSURequester it now support the request pipelining thanks to using the fifo. Additionally unit tests has to be updated, because after that change DummyLSU started to support reordering of miss-aligned instructions before the correct once.

Based on #696

github-actions[bot] commented 1 month ago

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
0.407 (0.000)	0.527 (0.000)	0.321 (0.000)	0.652 (0.000)	0.345 (0.000)	0.283 (0.000)	0.317 (0.000)	0.405 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 21866 (-601)	▲ 5569 (+8)	▼ 770 (-32)	▲ 1012 (+8)	▼ 48 (-1)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 33465 (-213)	▲ 8811 (+8)	1932 (0)	▲ 1192 (+8)	▲ 42 (+2)

github-actions[bot] commented 1 month ago

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
0.407 (0.000)	0.527 (0.000)	0.321 (0.000)	0.652 (0.000)	0.345 (0.000)	0.283 (0.000)	0.317 (0.000)	0.405 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 23085 (+163)	▲ 5569 (+8)	▲ 802 (+32)	▲ 1012 (+8)	▼ 46 (-4)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 32440 (-1665)	▲ 8811 (+8)	▼ 1932 (-32)	▲ 1192 (+8)	▼ 41 (-1)

github-actions[bot] commented 1 month ago

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
0.407 (0.000)	0.527 (0.000)	0.321 (0.000)	0.652 (0.000)	0.345 (0.000)	0.283 (0.000)	0.317 (0.000)	0.405 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 24084 (+1162)	▲ 5569 (+8)	▲ 802 (+32)	▲ 1012 (+8)	▼ 49 (-1)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 30477 (-3628)	▲ 8811 (+8)	1964 (0)	▲ 1192 (+8)	▼ 41 (-1)

tilk commented 1 month ago

No change in benchmarks, as Wishbone Classic doesn't support pipelining.

lekcyjna123 commented 1 month ago

No change in benchmarks, as Wishbone Classic doesn't support pipelining.

Yes, I have expected that, but I started the benchmark to make sure that there is no regression.