gussmith23 / lakeroad

FPGA synthesis tool powered by program synthesis
MIT License
38 stars 6 forks source link

Get Lattice Mult + ALU workloads synthesising #292

Closed gussmith23 closed 1 year ago

gussmith23 commented 1 year ago

Originally I thought there was a deeper issue why e.g. https://github.com/uwsampl/lakeroad/blob/97f006e270b243c6decd60e124332d73add21f35/integration_tests/lakeroad/three_stage_mul_add_lattice.v wasn't working (see #284). I don't think there's a deeper issue now. I think it's just a bug somewhere.

gussmith23 commented 1 year ago

One weird thing: we mention rst in the yaml file, but it's not defined anywhere. Why isn't that throwing an error?

gussmith23 commented 1 year ago

Oh, rst is on the mult, first of all, and second of all, it's a valid input!

gussmith23 commented 1 year ago

Here's a test: what happens if we plug the C value in to the MB input, instead of into C? Does it work then?

gussmith23 commented 1 year ago

Remember: there was also some curiosity about whether the ALU could hold the C value for a few cycles before operating on it. That may still be the issue. That is, A and B go into the multiplier and are used right away, but C goes into the ALU and so it has to be stalled until the multiplier is done.

gussmith23 commented 1 year ago

Okay, I've been on quite the rollercoaster. First, I thought I'd figured this out: i thought I'd realized that the workloads I was trying to synthesize onto the Lattice DSP just couldn't go onto the DSP. When I changed the initiation interval, it started synthesizing. This all came after some debugging, in which I realized that our imported semantics don't actually have the issue I hypothesized in #284. The semantics seem to work as you'd expect. However, it turns out I changed the initiation interval such that I just started synthesizing 0 as mentioned in #294 . So, while we discovered the bug wasn't a bug, we're still not synthesizing! So there's still debugging to be done.

gussmith23 commented 1 year ago

Running combinational, one, two, and three stage multiplier synthesis to see which works.

Combinational, one stage, two stage, three stage all synthesize.

gussmith23 commented 1 year ago

Here's something weird I just noticed: it seems like the ALU doesn't have any parameters once compiled. That's not right!

gussmith23 commented 1 year ago

I fixed that, but it wasn't the issue.

gussmith23 commented 1 year ago

Okay, I think I figured out the issue. Equally stupid but hearteningly simple. The ALU can do logic operations over B and C. I was wiring C into C, but I was wiring the mult output into A. I think I remember writing that and expecting a bug down the road...

gussmith23 commented 1 year ago

Okay, that wasn't the only bug. That got some stuff working (one stage mul + or), but didn't get two stage things working.

gussmith23 commented 1 year ago

I also realized that the clock wasn't wired into the ALU. I changed that, but that not only didn't fix things, but it also re-broke the mul+or test by making it spin forever. This is immensely frustrating...

gussmith23 commented 1 year ago

I added some constraints to the opcode to shrink the search space. Weirdly now it's only integration_tests/lakeroad/two_stage_multiplier_lattice.v (I think) that's spinning on boolector. Two stage mul+logic ops still not working, though one stage is. (Why would one stage work and not two?? Maybe it's because C can't be held that long?)

gussmith23 commented 1 year ago

Yup, confirmed it's the 2 cycle that's spinning. So strange.

gussmith23 commented 1 year ago

It may just be the case that 1 cycle or combinational ALU usage is all that's possible. Some facts I know from looking at the ALU and mult Verilog:

But all of this is adding up to the fact that 2-cycle should be possible.

gussmith23 commented 1 year ago

tried commenting out all constraints, still no luck.

gussmith23 commented 1 year ago

Also tried commenting out basically everything except the data inputs, to give the solver full reign. Still no luck. It's very convinced that this isn't possible.

gussmith23 commented 1 year ago

I also started to look directly at the bitvector expression output.

gussmith23 commented 1 year ago

One thing to try: (Almost) anything it returns for 1-cycle mul-or should work for 2-cycle as well, because we can just turn on the output register on the ALU. So let's see what it does for 1-cycle mul-or and try to adjust it.

gussmith23 commented 1 year ago

Ah, seems like the output register is already being used. So why does that have to be the case? Why doesn't it work if we try to do the ALU combinationally while delaying the C value?

gussmith23 commented 1 year ago

Seeing what happens if we force the C register to be used.

Interesting. It's just hanging in boolector. It's not failing yet though.

gussmith23 commented 1 year ago

I really can't figure this one out. I tried a few things: removing the unnamed inputs from ALU, messing with solver flags. It just spins when I force the C register to be used.

gussmith23 commented 1 year ago

There's a few entangled issues here.

gussmith23 commented 1 year ago

293 will get at least zero- and one-cycle mult+alu workloads synthesizing, which may actually still be good enough for our eval.

gussmith23 commented 1 year ago

This isn't actually fully fixed by #293