Closed gussmith23 closed 1 year ago
Starting to work on this. First step is to install Quartus and try to find the simulation files in there.
I think the free version of Quartus is pretty outdated. I don't think it'll have the latest Intel FPGAs.
I also applied to get access to full Quartus online on Intel's website. Pretty janky, kinda worried my request will just go right into the void.
I'm digging through Quartus files. I've found some formal verification files: intelFPGA_lite/22.1std/quartus/eda/fv_lib/verilog/lpm_mult.v intelFPGA_lite/22.1std/quartus/eda/fv_lib/verilog/altmult_add.v intelFPGA_lite/22.1std/quartus/eda/fv_lib/verilog/altmult_accum.v intelFPGA_lite/22.1std/quartus/eda/fv_lib/verilog/mult_block.v
These correspond to different blocks, e.g. for altmult_add.v: https://www.intel.com/content/www/us/en/docs/programmable/683490/20-3/altmult-add-multiply-adder-ip-core.html
The potentially painful thing about these implementations is that they use generate
s. We can handle generate
s, I'm pretty sure, but I think we pretty much have to remove them by hand, which is a bit annoying. We really should build an automated tool for this.
I'm removing generates by hand on altmult_accum. altmult_add seems like it might be "beefier", so we probably want to handle that eventually (though I'm really only going off of line count of the file). For that one, we'll probably need an automated solution.
altmult_accum conversion:
Just finished removing all the generate
s. We could definitely write a script or tool to do this. Most of them were of the form
generate if (cond) begin
<declare flipflop>
assign <signalname> = <output of ff>;
else
assign <signalname> = <other signal>;
end
endgenerate
The conversion is:
<declare flipflop>
above) out of the generateassign <signalname> = <cond> ? <true> : <false>;
One complication while converting params to ports: Some signal widths are determined by parameters. So we need to fix those input sizes, I guess. I'm unsure why they're variable -- the actual hardware unit does have max values for these. Perhaps we should just set them to their max possible value, if we can find that value?
TODO tomorrow: figure out how to set the width parameters. What is width_upper_data
?
Some stuff has to remain parameters. Anything that might go into determining a signal width must remain a parameter. Anything that is used as the LHS of a replicate. Other things like that. This is a limitation of Verilog -- I don't think there's any reason we can't convert basically everything to bitvector semantics (including expressions that depend on parameters). Actually, variable-width signals will probably still be a problem.
Another roadblock: need to convert the dffep primitive
into a module. It uses a table
, which isn't supported by Yosys. I'll have to rewrite by hand and hope it's correct.
Hm. It may actually be better to attack the intelFPGA_lite/22.1std/quartus/eda/sim_lib/altera_mf.v
file instead. It's written differently and Yosys may do better with it when trying to convert it to BTOR. By "better", I mean it doesn't seem to use any extra modules or anything.
Making some progress on this file (see uwsampl/lakeroad-private).
When I run:
read_verilog -sv altera_mf_modified.v
prep
proc
hierarchy -simcheck -check -top altmult_accum
flatten
clk2fflogic
write_btor
After my modifications so far, I hit the familiar error:
ERROR: Found topological loop while processing cell $auto$clk2fflogic.cc:236:execute$1081. Active cells:
$not$altera_mf_modified.v:2060$237
$auto$clk2fflogic.cc:236:execute$1081
$auto$clk2fflogic.cc:213:execute$1015
$auto$clk2fflogic.cc:225:execute$1019v
I'm not sure what I've done in the past when I've hit this error. I may have hit this on some Lattice DSP examples and given up. I'm also pretty sure I've gotten past it before.
This error comes from here: https://github.com/YosysHQ/yosys/blob/23826e5152483c39a617ee1c696c3b48a0c2b756/backends/btor/btor.cc#L248
I think it means: while trying to write out the btor value for an expression, we end up requiring the btor expression itself, making an infinite loop.
If i comment out this line: https://github.com/uwsampl/lakeroad-private/blob/5b9f0233d35c2d80823208bc2c07b62afe35f8e7/altera_mf/altera_mf_modified.v#L2060 the error goes away. Great, progress, but why? And obviously I can't leave it this way. So how to fix this w/o changing the module functionally?
I think it must be because the line flag = ~flag
appears twice in that same big always
block. Perhaps separating them somehow will fix this? Or moving the changes to flag
to separate always
blocks with similar sensitivity lists?
It would be nice to know exactly what the topological loop is. I don't understand this well enough to know where it's coming from.
Digging into the topological loop thing today.
If I comment out these lines:
always @ (posedge flag)
begin
if (extra_accumulator_latency == 0)
begin
result <= result_int[width_result - 1 + int_extra_width : int_extra_width];
overflow <= overflow_int;
accum_is_saturated_latent <= accum_saturate_overflow;
end
else
begin
result_pipe [head_result] <= {overflow_int, result_int[width_result - 1 + int_extra_width : int_extra_width]};
//mult_is_saturated_pipe[head_result] = mult_is_saturated_wire;
accum_saturate_pipe[head_result] <= accum_saturate_overflow;
head_result <= (head_result +1) % (extra_accumulator_latency + 1);
mult_is_saturated_int <= mult_is_saturated_wire;
end
end
always @ (negedge flag)
begin
if (extra_accumulator_latency == 0)
begin
result <= result_int[width_result - 1 + int_extra_width : int_extra_width];
overflow <= overflow_int;
accum_is_saturated_latent <= accum_saturate_overflow;
end
else
begin
result_pipe [head_result] <= {overflow_int, result_int[width_result - 1 + int_extra_width : int_extra_width]};
//mult_is_saturated_pipe[head_result] = mult_is_saturated_wire;
accum_saturate_pipe[head_result] <= accum_saturate_overflow;
head_result <= (head_result +1) % (extra_accumulator_latency + 1);
mult_is_saturated_int <= mult_is_saturated_wire;
end
end
Then it also works. I'm wondering if it's the interaction between the always
block that toggles flag
, and the always block that is triggered on flag
.
It could also be the two always
blocks that are triggered on different flag
events. What happens if I comment just one of them out?
If either of them is left uncommented, the error still occurs. So I assume it's not the interaction between the two flag
-sensitive always
blocks.
Easy test: fold the always
blocks triggered on flag
into the always
block that toggles flag
. That is, delete the always
blocks that are triggered on flag
, and move their logic into the always
block, where it toggles flag
.
Another always-worthy question is: is there a yosys pass we're missing here, that could help us?
I posted a Yosys question here: https://github.com/YosysHQ/yosys/discussions/3711
I tried folding the flag
-triggered always
blocks into the always
block above them, and got the following error:
3.2.5. Executing PROC_ARST pass (detect async resets in processes).
Found async reset \output_wire_clr in `\altmult_accum.$proc$altera_mf_modified.v:2040$233'.
ERROR: Async reset \output_wire_clr yields non-constant value 5'mmmmm for signal \result.
So no luck on that.
A useful and fast reply on the Yosys question, though it doesn't help in this case: https://github.com/YosysHQ/yosys/discussions/3711#discussioncomment-5411199.
Interestingly, if I make the clr
synchronous and not asynchronous, the problem also goes away:
diff --git a/altera_mf/altera_mf_modified.v b/altera_mf/altera_mf_modified.v
index 7310329..5de06ae 100644
--- a/altera_mf/altera_mf_modified.v
+++ b/altera_mf/altera_mf_modified.v
@@ -2037,7 +2037,7 @@ module altmult_accum ( dataa,
// -------------------------------------------------------------
// This is the main process block that performs the accumulation
// -------------------------------------------------------------
- always @(posedge output_wire_clk or posedge output_wire_clr)
+ always @(posedge output_wire_clk)
begin
if (output_wire_clr == 1)
begin
I'm unsure whether this is a good solution or not. I mean, it's very obviously not in the sense that we're changing the functionality of the device. But also we're changing it "minorly" (in a sense). I'm actually not sure how minor the change is b/c I don't know how often the clear signal will be used, or how often we'll rely on the clear signal being asynchronous.
Somewhat related, but we may not need clk2fflogic
; we may be able to get away with async2sync; dffunmap
instead. Unclear which is better. The second option produces less BTOR output.
Here's my yosys invocation:
yosys -p "
read_verilog -sv altera_mf_modified.v
hierarchy -simcheck -check -top altmult_accum
prep
proc
flatten
async2sync
dffunmap
write_btor"
Next task is to replace the strings with enums, and then convert ports to parameters.
Replacing strings with enums is easy now! I replace the pattern "(\w+)"
with PARAM_VALS_ENUM_$1
(in VSCode). This converts all instances of strings into a variant of an enum (which doesn't yet exist). Not all strings necessarily need to be converted, but in my experience, the vast majority do, so undoing by hand the ones that don't is generally not so tedious. Then, I run the following command:
grep -E "PARAM_VALS_ENUM_\w+" altera_mf_modified.v -o | sort | uniq
This gets the unique variants of the enum, which I can then copy and paste to declare the enum in the Verilog file.
Converting ports to parameters is harder. I do it methodically now. I generally will convert a batch at a time, and continually try converting the module to BTOR, as some parameters will inevitably make BTOR conversion (at least with Yosys) impossible. Some parameters that can't be converted to inputs:
Realization: conversion to BTOR can create a bunch of unnamed inputs. I think these are at least partially related to the use of Z
values, but I'm also realizing it may also be for registers that don't have initial values.
Modification: Handling non-constant init values. The error reads: ERROR: Failed to get a constant init value for \sign_a_reg:
. You need to make the init value constant. Does the init value matter? Can you guarantee that you'll always overwrite the init value with useful data?
See this modification:
@@ -876,17 +876,24 @@ module altmult_accum ( dataa,
sload_upper_data_pipe_reg = 0;
//sign_a_reg = (signa ===1'bz) ? ((representation_a == PARAM_VALS_ENUM_SIGNED) ? 1 : 0) : 0;
- sign_a_reg = ((representation_a == PARAM_VALS_ENUM_SIGNED) ? 1 : 0);
+ // Have to simplify this init value, now that representation_a is a port and not a parameter. Similar modifications follow.
+ // sign_a_reg = ((representation_a == PARAM_VALS_ENUM_SIGNED) ? 1 : 0);
+ sign_a_reg = 0;
//sign_a_pipe_reg = (signa ===1'bz) ? ((representation_a == PARAM_VALS_ENUM_SIGNED) ? 1 : 0) : 0;
- sign_a_pipe_reg = ((representation_a == PARAM_VALS_ENUM_SIGNED) ? 1 : 0);
+ // sign_a_pipe_reg = ((representation_a == PARAM_VALS_ENUM_SIGNED) ? 1 : 0);
+ sign_a_pipe_reg = 0;
//sign_b_reg = (signb ===1'bz) ? ((representation_b == PARAM_VALS_ENUM_SIGNED) ? 1 : 0) : 0;
- sign_b_reg = ((representation_b == PARAM_VALS_ENUM_SIGNED) ? 1 : 0);
+ // sign_b_reg = ((representation_b == PARAM_VALS_ENUM_SIGNED) ? 1 : 0);
+ sign_b_reg = 0;
//sign_b_pipe_reg = (signb ===1'bz) ? ((representation_b == PARAM_VALS_ENUM_SIGNED) ? 1 : 0) : 0;
- sign_b_pipe_reg = ((representation_b == PARAM_VALS_ENUM_SIGNED) ? 1 : 0);
+ // sign_b_pipe_reg = ((representation_b == PARAM_VALS_ENUM_SIGNED) ? 1 : 0);
+ sign_b_pipe_reg = 0;
//addsub_reg = (addnsub ===1'bz) ? ((accum_direction == PARAM_VALS_ENUM_ADD) ? 1 : 0) : 0;
- addsub_reg = ((accum_direction == PARAM_VALS_ENUM_ADD) ? 1 : 0);
+ // addsub_reg = ((accum_direction == PARAM_VALS_ENUM_ADD) ? 1 : 0);
+ addsub_reg = 0;
//addsub_pipe_reg = (addnsub ===1'bz) ? ((accum_direction == PARAM_VALS_ENUM_ADD) ? 1 : 0) : 0;
- addsub_pipe_reg = ((accum_direction == PARAM_VALS_ENUM_ADD) ? 1 : 0);
+ // addsub_pipe_reg = ((accum_direction == PARAM_VALS_ENUM_ADD) ? 1 : 0);
+ addsub_pipe_reg = 0;
result_int = 0;
result = 0;
The thing to check was whether the init value of these signals mattered; that is, would they ever actually be read? The honest answer is that I'm not 100% sure, but I did check to make sure they're set somewhere, generally in an always
block that seems like it'll always fire.
In this case, I think there's a weird circumstance where the init value could be read: this block is complicated, and has multiple clocks, and different registers can be optionally controlled by different clocks (interesting thing, not sure why). So if we do something weird where we control the signal with one clock, but we never run that clock, then its init value will be read. But if we hook up everything to the same clock, we should be good.
This is hard! There's a lot to keep track of. You can't just make all parameters into ports, because there's a lot of Verilog ops that can't handle non-constants in one of the arguments (e.g. replication).
I was wondering last night: am I on the right path? Is there some other, automated way to do this that would actually be feasible to implement in the time we have?
I have something importing now.
Two things I had to do:
Had to support setting parameters in the Verilog-to-Racket script, new argument --parameter
.
Had to remove instances of Z values; at least one of them was getting translated to invalid Racket ((bv #f (bitvector 1))
).
Next is to try out the Racket version. Can we write a test that uses them?
I'm fighting with getting a test working. I'm unsure how many times to run the clock. I see a bvmul
expression in there, but it's nested in a bunch of ite
s that I guess are unsatisfiable.
I guess I can try hard-coding values.
Maybe part of the problem is the is_stratixXXX
values? These are registers that aren't actually set. They shouldn't be problems, because they should be set to symbolic values and then the solver can set them as needed. But that may not actually happen...
Still struggling with this at the end of the day.
Plan for debugging the next time I log on:
Currently, the test runs the clock twice (or maybe three times) and then asserts that the output is equal to bvmul.
bvmul
expression we're trying to get to still ends up in the output.A bit disheartening: I tried some very open ended synthesis problems on the Racket model for altmult_accum, and i can't get anything interesting out of it. For example, I run it for two cycles, with symbolic inputs each cycle, and I ask Rosette to find any setting of the symbolics that makes the output nonzero...and it can't find anything!
To me, this is telling me that the imported model is wrong, meaning it was either wrong to begin with (which we could check with Verilator if we wanted) or (more likely) that my modifications broke it. I'm not 100% sure that's the case quite yet though.
One way forward would be to re-do the import and make very minimal changes, and see if the resulting model works.
Another thing to check would be to see whether the fact that we're setting the bitwidth explicitly is breaking things. We're setting it to 18bit inputs, 18bit output, when the default is (for some reason) 2bit inputs and 5 bit output (maybe 2+2+1 sign bit?).
Taking a new strategy: start again from the start, make the minimum modifications necessary, see if we can even get a basic version working.
tri0
, tri1
, supply0
with logic
.flag
from int to logicalways @ (posedge flag or negedge flag)
to always @ (flag)
And it still doesn't work. It could be that there are problems with my testing code, or that I'm not running it for enough cycles. But it seems like the multiply is there in the resulting expression, and we just can't "get it out" because it's nested within a bunch of unsatisfiable if statements.
Something does seem wrong with my testing code, though, because I literally can't get the module to produce ANYTHING but 0.
I wonder if it's best now to go back to importing the other model?
Doing some digging. IF we run the Racket model twice with all symbolic inputs, we get an expression like:
(ite (bveq (bv #b0 1) (bvor aclr3$24 aclr3$120))
(ite (bveq (bv #b10 2) (concat clock0$126 clock0$30))
(ite (bveq (bv #b0 1) ena0$43)
(bv #b00000 5)
(ite (bveq (bv #b0 1)
(ite (bveq (bv #b0 1)
(bvor (ite (bveq (bv #b0 1) clock0$30) (bv #b1 1) (bv #b0 1))
(ite (bveq (bv #b0 1) addnsub$25) (bv #b1 1) (bv #b0 1))))
(ite (bveq (bv #b0 1) aclr3$24) (bv #b1 1) (bv #b0 1))
(bv #b1 1)))
(bvneg (ite (bveq (bv #b0 1) ena0$43) unnamed-input-417$93 (bv #b00000 5)))
(ite (bveq (bv #b0 1) ena0$43) unnamed-input-417$93 (bv #b00000 5))))
(bv #b00000 5))
(bv #b00000 5))
If you stare at this, you can figure out that there's no way to not end up hitting one of the (bv #b00000 5) cases. It's because no matter what you set ena0$43
to, you end up hitting the 0 case. That seems okay...it seems here that we just need to run the model a few more times to get to an interesting spot. I'm not sure if there's anything interesting to be taken from this example, in that case.
One thing that could be happening here is that we just don't know what the right way to toggle the signals is. In the past, we just need to toggle clock up, toggle it down, up, down, etc, to step the design along. But here, it may be more complicated.
I'm thinking I should just go back and see how close I was to importing the other one.
Tried a few things. Tried importing the other model. Tried importing the other model with very limited modifications. Tried importing altmult_add
with limited (basically zero) modifications. In all cases, hitting the same issue -- provably, the output is always zero.
Perhaps the next thing to do (if possible) is to stare at the model for altmult_accum
or altmult_add
and figure out when exactly it should produce nonzero output. Perhaps I should even try simulating with Verilator to confirm.
I'd really prefer to do this with the simpler simulation model!
Doing some debugging now. I'm basically shrinking the output that I'm trying to get the module to output and checking whether the resulting Racket can output the value I want. I've shrunk it down so the module just outputs the registered value for the input a
, and I still can't get that signal to be anything other than zero. After calling the module twice, with all symbolic inputs, I get an expression:
(ite (bveq (bv #b0 1) (bvor aclr3$24 aclr3$120))
(ite (bveq (bv #b10 2) (concat clock0$126 clock0$30))
(ite (bveq (bv #b0 1) (bvand clock0$30 ena0$43)) (bv #b00 2) dataa$38)
(bv #b00 2))
(bv #b00 2))
You can see that this expression might evaluate to dataa
, though if you actually stare at it, you'll see it actually can't!
We need the innermost ite
to take its then
branch, so we need 0 == clock0$30&&ena0$43
to be false, so we need both of those signals to be true.
We need the next innermost ite
to take its if
branch, so we need clock0$30
to be 0. Therein lies our contradiction. For some reason, we need the clock to be both 0 and 1. Something seems wrong about this, though.
Here is the corresponding Verilog (I think):
always @(posedge input_a_wire_clk or posedge input_a_wire_clr)
begin
if (input_a_wire_clr == 1)
mult_a_reg <= 0;
else if ((input_a_wire_clk == 1) && (input_a_wire_en == 1))
begin
if (input_source_a == "DATAA")
mult_a_reg <= (int_width_a == width_a) ? dataa : {dataa, {(diff_width_a){1'b0}}};
...
I feel like I'm close on this.
A possible fix: Separate the always
block into two duplicates of itself. Remove the if
statement.
If that worked, I'm not sure I would understand why it worked.
It seems like the clock value on this line:
else if ((input_a_wire_clk == 1) && (input_a_wire_en == 1))
is somehow getting pulled "from the past". I.e. that should be clock0$126
, but it's clock0$30
.
It's as if the value is getting registered, and we're getting the registered value from the previous cycle.
The problem might be with the fact that I'm using clk2fflogic
in Yosys. That pass may be doing more than what I want it to.
Okay, that seemed to help things. Trying on the full design...
Success! Was able to at least get it to produce a nonzero output.
To summarize what the problem was: The problem was actually in the Verilog to Racket Python script, where we were using clk2fflogic
, which breaks the btor when there are multiple clocks. I'm not sure exactly why it breaks other than roughly that clk2fflogic
combines a bunch of clocks into a single global clock.
Next things to do:
async2sync
.Summary of where we're at:
altmult_accum
, which is the mult-accum DSP on some Intel architecture (the model actually spans multiple architectures, which is part of what made it hard to import)What we need next:
altmult_add
is a good place to start)Main things I learned:
clk2fflogic
was doing more than what we wanted it to do. It was combining all clocks into one global clock, which created weird bugs. I fixed it by using...some other Yosys pass that wasn't clk2fflogic
. dffunmap
maybe?generate
s also don't work well for us. We have to remove them. There is a straightforward procedure for doing this, though.module module_you_are_looking_for
). One may import more easily than another.table
s in VerilogWhoops, reopening until we merge #211
I'm coming back around to this. I'm looking at this block of code that I looked at before:
always @(posedge input_a_wire_clk or posedge input_a_wire_clr)
begin
if (input_a_wire_clr == 1)
mult_a_reg <= 0;
else if ((input_a_wire_clk == 1) && (input_a_wire_en == 1))
It seems like yosys is interpreting the read of input_a_wire_clk
on the last line as "coming from the past". But also, that if statement doesn't make much sense anyway -- of course the clock is 1, this is within a posedge
triggered block, and we've already checked that clr==0
! So can we just remove that part of the and?
That works! We have a much more correct model of the DSP now.
Potential DSPs: a10, s10, NX