kuznia-rdzeni / coreblocks

RISC-V out-of-order core for education and research purposes
https://kuznia-rdzeni.github.io/coreblocks/
BSD 3-Clause "New" or "Revised" License
33 stars 13 forks source link

Change 2-FIFOs to Pipes #668

Open tilk opened 2 months ago

tilk commented 2 months ago

This PR changes some 2-FIFOs (mostly in the scheduler) into Pipes. Pipes have more combinational connections than FIFOs, but use less resources. The purpose is to validate the usefulness of pipes for gluing the core together.

TODO: replace more 2-FIFOs with pipes and check benchmarks again.

github-actions[bot] commented 2 months ago

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
🔺 0.406 (+0.002) 🔺 0.463 (+0.006) 🔺 0.312 (+0.001) 🔺 0.646 (+0.002) 🔺 0.345 (+0.000) 🔺 0.258 (+0.002) 🔺 0.307 (+0.003) 🔺 0.416 (+0.018)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
🔻 21017 (-1584) 🔺 5980 (+420) 🔻 798 (-4) 🔻 536 (-468) 🔺 51 (+3)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
🔻 31713 (-1938) 🔺 9248 (+446) 🔺 1970 (+32) 🔻 716 (-468) 🔺 43 (+1)
github-actions[bot] commented 2 months ago

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
🔺 0.406 (+0.002) 🔺 0.463 (+0.006) 🔺 0.312 (+0.001) 🔺 0.646 (+0.002) 🔺 0.345 (+0.000) 🔺 0.258 (+0.002) 🔺 0.307 (+0.003) 🔺 0.416 (+0.018)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
🔺 23086 (+485) 🔺 5980 (+420) 🔻 798 (-4) 🔻 536 (-468) 🔺 50 (+1)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
🔺 34897 (+1246) 🔺 9248 (+446) 1938 (0) 🔻 716 (-468) 🔻 42 (-1)
github-actions[bot] commented 2 months ago

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
🔺 0.406 (+0.002) 🔺 0.463 (+0.006) 🔺 0.312 (+0.001) 🔺 0.647 (+0.004) 🔺 0.345 (+0.000) 🔺 0.260 (+0.004) 🔺 0.307 (+0.003) 🔺 0.420 (+0.022)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
🔺 23752 (+1151) 🔺 6041 (+481) 🔻 798 (-4) 🔻 472 (-532) 🔺 49 (+0)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
🔺 35360 (+1709) 🔺 9296 (+494) 1938 (0) 🔻 648 (-536) 🔺 44 (+1)
tilk commented 2 months ago

Device utilization metric does some weird non-deterministic jumps. The double-triggered workflow got completely different results.