uwsampl / lakeroad

FPGA synthesis tool powered by program synthesis
MIT License
33 stars 7 forks source link

Debugging Virtex DSP48E1 bug #445

Open gussmith23 opened 1 week ago

gussmith23 commented 1 week ago

Tracking issue for finding the bug in the DSP48E1 model.

Some things to try in debugging:

Other subtasks to finish this up:

gussmith23 commented 1 week ago

First, Vishal found most of the bugs it seems -- all that remains are bugs in the combinational case. That is, if we synthesize combinational designs, the result from Lakeroad does not function as expected. Interestingly, the differences can be subtle, too. For example, it took a number of iterations for random testing with Verilator to find a case that breaks when testing a 9bit mul-sub case.

gussmith23 commented 1 week ago

2024-06-26

Trying to binary search the bug.

Process:

Returning just the input A works.

Returning multiplication of A * B works.

A*B+C works. It seems like it's subtraction (A*B-C) that's not working. It's extra suspicious that it is an off-by-one error and that it doesn't always appear.

I thought I found the bug, but still getting bugs in eval. Weirder thing is that I'm pretty sure more stuff is breaking in the eval now, compared to when Vishal was running it. He told me only combinational stuff was breaking, but now, 2 stage stuff is breaking too.

At least one of the tests that's failing in the eval is passing when I make it an integration test in LR. Unsure why that would be, but it certainly feels like it could be a mismatched Lakeroad version or something.

Furthermore, the tests are passing when I run them on leviathan, within the lakeroad submodule of the lakeroad eval repo. weird.

Yeah, every test i'm adding to Lakeroad that's failing in the eval seems to be passing in Lakeroad itself. Something's weird.

List of tests not passing:

robustness_experiments:mult_0_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_0_stage_signed_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_2_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_2_stage_unsigned_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_0_stage_signed_13_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_0_stage_unsigned_13_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_0_stage_signed_16_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_0_stage_unsigned_16_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_0_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:muladd_0_stage_signed_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:muladd_0_stage_unsigned_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:muladd_0_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:muladd_0_stage_signed_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:muladd_0_stage_signed_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:muladd_0_stage_signed_16_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_0_stage_unsigned_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_1_stage_signed_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_1_stage_unsigned_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_0_stage_unsigned_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_0_stage_signed_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_2_stage_signed_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_1_stage_signed_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_1_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_2_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_2_stage_signed_18_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_2_stage_unsigned_18_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmuladd_0_stage_signed_18_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmuladd_0_stage_unsigned_18_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_signed_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_unsigned_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_2_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_unsigned_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_2_stage_unsigned_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_signed_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_2_stage_signed_12_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_1_stage_unsigned_13_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_2_stage_signed_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_2_stage_unsigned_16_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_signed_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_unsigned_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_1_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_2_stage_unsigned_18_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submuladd_0_stage_signed_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submuladd_0_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_signed_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_signed_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_unsigned_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_unsigned_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_unsigned_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_unsigned_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_unsigned_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_signed_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_unsigned_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_signed_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_unsigned_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_unsigned_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_signed_12_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_signed_13_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_unsigned_13_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_signed_14_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_signed_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_signed_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_signed_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_unsigned_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_signed_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_unsigned_16_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_signed_18_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_unsigned_18_bit:lakeroad-virtex:verilator <stdout>:

Gonna run an eval run in Docker and see if it replicates there.

gussmith23 commented 1 week ago

2024-06-27

Seems like it's still running into errors in Docker. So I think there's two options at this point

The second one is easier to check, so i'll do it first, but at this point it really does feel more likely that it's the first.

Oh interesting. Before, I tried disabling all solvers but Bitwuzla, so that I could get the eval to match what was happening in the integratio ntests (because all of my integration tests were using Bitwuzla). But when I ran the eval with just Bitwuzla...well, what happened? I forget, but I think what happened was that I couldn't replicate the issues? Oh, no, according to this thread with Vishal, it was still failing when I just used Bitwuzla. Anyway, what just happened was that I made an integration test that uses STP instead of Bitwuzla, and it fails. So I finally have the case I was looking for, which is good. Now I can go back to the delta debugging I was doing earlier.

I just went through the diffs. Here's the thoughts i had.

Easy check of the second point -- just remove the delay in the Verilator model and see what happens.

interestingly, it's still an off by one error.

I'm almost certain the bug is in the carry logic. There's some weird stuff happing there. Specifically I think I just realized that qcarryin_o_mux0 isn't driven in the modified-for-racket version, which is definitely not right (it's driven in the original version). I might have deleted that code---wouldn't be surprised.

qcarryin_o_mux is also multiply driven. That has to cause issues. I'll look at the above missing driver issue first, but I'd say that the multiple drivers issue is also worth looking at!

Ok, there was definitely a bug with qcarryin_o_mux0 not being set - just a typo bug. That's resolved.

Still bugs left though. Running the eval and things are still failing. E.g. muladd_0_stage_signed_9_bit.

Again, still an off by one error. So I suspect it's more bugs in the carry.

There are still latches being inferred, too.

Latch inferred for signal `\DSP48E1.\qalumode_o_mux' from process `\DSP48E1.$proc$/home/gus/lakeroad-evaluation/lakeroad/lakeroad-private/DSP48E1/DSP48E1_modified_for_racket_import.v:920$434': $auto$proc_dlatch.cc:427:proc_dlatch$1106
Latch inferred for signal `\DSP48E1.\qopmode_o_mux' from process `\DSP48E1.$proc$/home/gus/lakeroad-evaluation/lakeroad/lakeroad-private/DSP48E1/DSP48E1_modified_for_racket_import.v:904$430': $auto$proc_dlatch.cc:427:proc_dlatch$1133
Latch inferred for signal `\DSP48E1.\cci_drc_msg' from process `\DSP48E1.$proc$/home/gus/lakeroad-evaluation/lakeroad/lakeroad-private/DSP48E1/DSP48E1_modified_for_racket_import.v:888$405': $auto$proc_dlatch.cc:427:proc_dlatch$1160
Latch inferred for signal `\DSP48E1.\cis_drc_msg' from process `\DSP48E1.$proc$/home/gus/lakeroad-evaluation/lakeroad/lakeroad-private/DSP48E1/DSP48E1_modified_for_racket_import.v:888$405': $auto$proc_dlatch.cc:427:proc_dlatch$1189
Latch inferred for signal `\DSP48E1.\qcarryinsel_o_mux' from process `\DSP48E1.$proc$/home/gus/lakeroad-evaluation/lakeroad/lakeroad-private/DSP48E1/DSP48E1_modified_for_racket_import.v:879$402': $auto$proc_dlatch.cc:427:proc_dlatch$1212
Latch inferred for signal `\DSP48E1.\the_mask' from process `\DSP48E1.$proc$/home/gus/lakeroad-evaluation/lakeroad/lakeroad-private/DSP48E1/DSP48E1_modified_for_racket_import.v:1385$702': $auto$proc_dlatch.cc:427:proc_dlatch$1069

Killed the latches.

I should really include somewhere that yosys is a very helpful debugging tool when doing this process.

Still running into bug.

Interestingly, the multiple drivers warning also disappeared? That doesn't feel right. I don't feel like I fixed that issue.

gussmith23 commented 1 week ago

2024-06-28

Realizing I did solve the multiple drivers problem when I fixed the typo bug.

Got another bug to replicate in integration tests: integration_tests/lakeroad/xilinx-virtex/virtex_mult_0_stage_signed_12_bit.sv

Interestingly, fails with STP but not Bitwuzla. So theyre just finding different solutions, I assume.

gussmith23 commented 1 week ago

2024-06-29

My debugging strategy now is to put $displays in the Verilator model to see where in the pipeline the value diverges from what i expect. Frankly I probably should be using a waveform viewer for this.

Unsurprisingly, qcarryin_o_mux is getting set to 1 at some point. That's definitely the problem.

Ok, I think the problem is coming from the fact taht we changed

              3'b111 : qcarryin_o_mux_tmp <= qp_o_mux[47];

to

              3'b111 : qcarryin_o_mux_tmp <= qp_o_reg1[47];

If I recall, we did this because of a combinational cycle that was caused by using qp_o_mux. We thought through it though, and i think this change should have been legal based on some constraints that should be in the design.

Found the bug -- the constraints in the arch description for carryinsel were wrong. They'd been copy-pasted and not updated.

gussmith23 commented 6 days ago

2024-07-01

Apparently not done with debugging yet. One of these tests is failing in CI.

At this point, might be most efficient to just look through the constraints for any more obvious bugs.

No obvious bugs.

Back on the hunt for bugs in the code itself. Specifically just adding $display statements into the simulation model.

For this singular example, it's correct up to qmult_o_mux. So bug is presumably still after that. I wonder if it's still carryin related.

z mux is C, x is the result of the mul, y is zero. so those all check out just fine. alu_o also seems to have the right answer.

It seems to be an issue with the P register. It seems like the racket import version thinks that the ALU output will make it to the output, but it doesn't. The correct output is presumably at alu_o, but doesn't make it through to PREG.

I wonder if vishal just didn't find all of the constraints on PREG? They're encoded in a very, very annoying way in the simulation file. I wonder if they're more compact in the pdf.

He got all of the ones I found when I just did a pass.

gussmith23 commented 6 days ago

2024-07-02

IT seems like the PREG is working fine---I think there might be a register in the middle somewhere that's enabled when it shouldn't be.

C is making it through to the ALU instantly. But the result of the multiply does not -- it seems to be only making it to the ALU

Wait nevermind -- it seems like an answer is coming out of the multiplier before the clock ticks, but it's not correct. then when the clock ticks, the multiplier (qmult_o_mux) signal gets the right answer (minus the post-add). But then because

omg, it was because i had different --pipeline-depths set for synthesis and simulation. This would have been fixed by #442. Wasted a whole day on this...