lnis-uofu / OpenFPGA

An Open-source FPGA IP Generator
https://openfpga.readthedocs.io/en/master/
MIT License
841 stars 162 forks source link

Incorrect bitstream generation #1267

Open hopwoodc opened 1 year ago

hopwoodc commented 1 year ago

Describe the bug

When at least one of the regression tests is run for longer than is default, the test fails. There are two ways it has been broken between v1.2.0 release, and now. I'm going to briefly point out the commits where the first breakage occurred. Then, since I have studied it more, I will focus on the second breakage in more detail.

I found this issue with fpga_verilog/lut_design/single_mode, though I assume it may be occurring in others as well. I manually modified and2_formal_random_top_tb.v to run the simulation longer. This was done at the line that says, "can be changed by the user for his/her need".

First way it was broken, from 62511f47 to 3a3877fd to ab53f88c

In 62511f47, the above task's formal benchmark succeeds even if simulation time is manually extended.

In 3a3877fd, VPR throws an error while running the task, which is then fixed in ab53f88c.

In ab53f88c, while VPR does not throw the error, the output of the FPGA is incorrect. I have not looked into the specifics of how it fails yet. The output is not a constant, which differentiates it from the other way it failed in later commits, described below.

Second way it was broken, from e2fc6fac to 7387fd3b2

Starting in 7387fd3b2, the above benchmark's FPGA output is stuck at zero. In a recent commit (62f68a38), I traced the routing of the signals in simulation, and found signal B from the IO pad to the first switchbox was not selected as it should have been.

Signal "B" was assigned to grid_io_1__2 subtile 0. According to the fabric independent bitstream, that pin should be routed through sb_02 mux_right_track_16. However, in the FPGA netlist, that connection doesn't exist. Instead, that pin is connected to sb_02 mux_bottom_track_1.

Which part of OpenFPGA is buggy?

I'm not sure, but will say I suspect the issue is in one or more of these: [ ] FPGA-Verilog [ ] FPGA-Bitstream [ ] VPR

To Reproduce Steps to reproduce the behavior:

  1. Clone OpenFPGA repository and checkout commit id: 62f68a38
  2. Execute OpenFPGA task fpga_verilog/lut_design/single_mode
  3. Modify the run's and2_formal_random_top_tb.v to extend the run to #20 time units.
  4. Recompile and rerun the benchmark, observe the failure.

Expected behavior I expect:

  1. The output of the FPGA to match the benchmark circuit
  2. The fabric independent bitstream to match the connections I see in the fabric

Enviornment (please complete the following information):

  • OS:
    • [ ] Alma Linux 8
  • Compiler:
    • [ ] gcc-8
    • Version:
    • [ ] Current master
tangxifan commented 1 year ago

@hopwoodc Thanks for tracing this bug through version history. I do manage to reproduce the bug. I have also tried other testcases:

They can pass even when you increase the simulation period to a longer duration. Therefore, I think there is a bug for the selected architecture and benchmark. OpenFPGA has been working on a few silicon that we have taped out (including commercial ones). We do not see such bug on these devices. I leave this issue open as a reminder. I believe this bug should be addressed.

I suggest to try different arch and see if the issue consists.

hopwoodc commented 1 year ago

@tangxifan Thanks for the response!

I tried changing the operating num_clocks for all simulation settings to 100, and ran all the regressions. This has not revealed any new failures outside what was already seen with fpga_verilog/lut_design/single_mode.

Given the above, I agree it's likely just an issue with that specific architecture and benchmark combo and not a broader bug. It should be addressed like you said, but it's not as critical as I was worried it was.