kuznia-rdzeni / coreblocks

RISC-V out-of-order core for education and research purposes
https://kuznia-rdzeni.github.io/coreblocks/
BSD 3-Clause "New" or "Revised" License
36 stars 14 forks source link

Update Amaranth to just before RFC 45 #733

Open tilk opened 6 days ago

tilk commented 6 days ago

Amaranth update. It includes, among others:

I intend to do Amaranth version updates in parts, so that the pull requests will be manageable. This PR unfortunately turned out to be quite large because of an Amaranth bug - see history below.

Somehow, synthesis results are better, and the reasons for that are unknown. Is this a result of Yosys or Amaranth upgrade? Is this a bug? I don't know.

Disclaimer: a lot of the new type stubs are untested, and some of them will probably need to be corrected in the future.

piotro888 commented 5 days ago

There is a dramatic change in FMax and utilisation (50MHz to 60 MHz; 25k to 15k) on basic. interesting.... Is that factor of improvement possible on yosys/amaranth update? or could it be synthesis bug?

and all benchmarks time out as well :(

tilk commented 5 days ago

There is a dramatic change in FMax and utilisation (50MHz to 60 MHz; 25k to 15k) on basic. interesting.... Is that factor of improvement possible on yosys/amaranth update? or could it be synthesis bug?

and all benchmarks time out as well :(

Dang, more investigation is needed.

tilk commented 5 days ago

It looks like the CPU locks up on a CSR instruction. Edit: And it looks like this happens only in Verilator. Edit: And it's probably the same or similar bug as the previous one. Amaranth somehow didn't emit this line to Verilog:

m.d.sync += instr.valid.eq(1)

Edit: found the problematic Amaranth commit: https://github.com/amaranth-lang/amaranth/commit/6f44438e585dd54a89c0112732710b389e25a71b

tilk commented 4 days ago

The issue was in Amaranth, it was silently fixed in this commit: https://github.com/amaranth-lang/amaranth/commit/2d59242bf731b258e2f155fece933ae66acbe03b There goes my plan for updating Amaranth in small steps.

github-actions[bot] commented 2 days ago

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.417 (0.000) 0.513 (0.000) 0.337 (0.000) 0.655 (0.000) 0.361 (0.000) 0.290 (0.000) 0.326 (0.000) 0.431 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 14175 (-10558) 5889 (0) 770 (0) 972 (0) ▲ 56 (+5)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 27183 (-11600) ▼ 9141 (-1) ▼ 1726 (-218) 1152 (0) ▲ 47 (+9)
github-actions[bot] commented 1 day ago

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.417 (0.000) 0.513 (0.000) 0.337 (0.000) 0.655 (0.000) 0.361 (0.000) 0.290 (0.000) 0.326 (0.000) 0.431 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 15012 (-9721) 5889 (0) 770 (0) 972 (0) ▲ 60 (+9)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 30360 (-8423) ▼ 9141 (-1) ▼ 1726 (-218) 1152 (0) ▲ 43 (+5)
github-actions[bot] commented 1 day ago

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.417 (0.000) 0.513 (0.000) 0.337 (0.000) 0.655 (0.000) 0.361 (0.000) 0.290 (0.000) 0.326 (0.000) 0.431 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 14194 (-10539) 5889 (0) 770 (0) 972 (0) ▲ 60 (+10)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 23439 (-15344) ▼ 9141 (-1) ▼ 1758 (-186) 1152 (0) ▲ 46 (+9)
github-actions[bot] commented 1 day ago

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.417 (0.000) 0.513 (0.000) 0.337 (0.000) 0.655 (0.000) 0.361 (0.000) 0.290 (0.000) 0.326 (0.000) 0.431 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 15388 (-9345) ▼ 5888 (-1) 770 (0) ▲ 1068 (+96) ▲ 59 (+8)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 30053 (-8730) ▼ 9140 (-2) ▼ 1726 (-218) ▲ 1248 (+96) ▲ 44 (+6)
github-actions[bot] commented 1 day ago

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.417 (0.000) 0.513 (0.000) 0.337 (0.000) 0.655 (0.000) 0.361 (0.000) 0.290 (0.000) 0.326 (0.000) 0.431 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 14876 (-9857) ▼ 5888 (-1) 770 (0) ▲ 1068 (+96) ▲ 55 (+4)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 25883 (-12900) ▼ 9140 (-2) ▼ 1758 (-186) ▲ 1248 (+96) ▲ 44 (+6)
github-actions[bot] commented 1 day ago

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.417 (0.000) 0.513 (0.000) 0.337 (0.000) 0.655 (0.000) 0.361 (0.000) 0.290 (0.000) 0.326 (0.000) 0.431 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 15103 (-9630) 5889 (0) 770 (0) 972 (0) ▲ 53 (+2)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 25536 (-13247) ▼ 9141 (-1) ▼ 1726 (-218) 1152 (0) ▲ 44 (+7)
github-actions[bot] commented 1 day ago

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.417 (0.000) 0.513 (0.000) 0.337 (0.000) 0.655 (0.000) 0.361 (0.000) 0.290 (0.000) 0.326 (0.000) 0.431 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 13812 (-10921) 5889 (0) 770 (0) 972 (0) ▲ 59 (+8)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 24503 (-14280) ▼ 9141 (-1) ▼ 1758 (-186) 1152 (0) ▲ 46 (+8)