verilog-to-routing / vtr-verilog-to-routing

Verilog to Routing -- Open Source CAD Flow for FPGA Research
https://verilogtorouting.org
Other
1.01k stars 390 forks source link

Cluster attraction groups for tight floorplan constraints not working #2089

Closed vaughnbetz closed 2 years ago

vaughnbetz commented 2 years ago

The merge of Mohamed Elgammal's new clustering api and refactoring somehow broke cluster attraction groups, which are necessary for tight floorplan region constraints. They should be fixed.

Expected Behaviour

Attraction groups should force a dense packing on parts of a design with tight floorplan constraints.

Current Behaviour

Instead, sfkhalid tells me that they currently don't work (in fact area seems to increase in later clustering attempts) so something is broken.

Possible Solution

These did work, so presumably something was broken in the refactoring.

Steps to Reproduce

I believe the failures happen on the tests Sarah has on her own branch, but which aren't yet on master.

Context

Need this fixed before recommending people use the floorplan constraints. This is also holding up putting a full set of tests for floorplanning into CI, which is dangerous as more things could break without anyone knowing.

Your Environment

VTR master in late May, Linux

ArashAhmadian commented 2 years ago

@vaughnbetz @sfkhalid I've been looking into using fp constraints and thought it would be worth sharing some things I've come across.

This PR #1938 is where things break for a very tightly constrained designed I've been working on. Before master containing Mohamed's first clustering API PR was merged into sfkhalid's branch with this commit (I think): a806ffbc384ee4bbf9f3d2b9580bf67fa6a15f9b, this assertion fails: File: vtr-verilog-to-routing/vpr/src/pack/pack.cpp Line:200 Message: Failed to find device which satisifies resource requirements (bug since my target arch has more than enough primitives for the design)

Mohamed's first clustering API PR #1954 was not merged in at that point in the branch I think based on the commit messages . After sfkhalid changed the packflow in this commit: 968e82cc86848334a801f89606ec5b55d52ce353 , the thrown assertion changes to the one happening on master right now. Behaviour is that the primitives are too tightly packed for a "manual" placement done using placement_constraints.xml though the exact pattern of the invalid packing for mycase are different between the above commits and the one happening on master.

Maybe this bug somehow slipped through the cracks before the API changes were done and were introduced by the code in the working branch of #1938 ?

vaughnbetz commented 2 years ago

@zhaisitong: can you take a look at this one? We can discuss Thursday.

vaughnbetz commented 2 years ago

@sfkhalid : any updates?

zhaisitong commented 2 years ago

@zhaisitong: can you take a look at this one? We can discuss Thursday.

How could I run to the failed place? What is the exact command or what regressions should I run to get this bug?

vaughnbetz commented 2 years ago

Good question Sitong. I don't think the failing tests were integrated into CI yet (as they were failing), so they don't run automatically.

@sfkhalid can you please point Sitong at the failed regression tests and give steps on how to reproduce the failures? Adding @MohamedElgammal in case he knows how to run / reproduce these tests too.

MohamedElgammal commented 2 years ago

I am not sure which tests are currently failing. However, I will attach a conversation I had with Sara before in which she pointed me how to manually run floorplanning tests

"The tests should be run with the following commands:

Test 1:

../vtr-verilog-to-routing/vpr/vpr ../stratixiv_arch.timing.xml ../vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/neuron_stratixiv_arch_timing.blif --route_chan_width 300 --max_router_iterations 400 --router_lookahead map --initial_pres_fac 1.0 --router_profiler_astar_fac 1.5 --seed 3 --sdc_file ../vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/neuron_stratixiv_arch_timing.sdc --read_vpr_constraints sixteenth.xml --device neuron

Test 2:

../vtr-verilog-to-routing/vpr/vpr ../stratixiv_arch.timing.xml ../vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/neuron_stratixiv_arch_timing.blif --route_chan_width 300 --max_router_iterations 400 --router_lookahead map --initial_pres_fac 1.0 --router_profiler_astar_fac 1.5 --seed 3 --sdc_file ../vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/neuron_stratixiv_arch_timing.sdc --read_vpr_constraints one_big_partition.xml --device neuron

Test 3:

../vtr-verilog-to-routing/vpr/vpr ../stratixiv_arch.timing.xml ../vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/neuron_stratixiv_arch_timing.blif --route_chan_width 300 --max_router_iterations 400 --router_lookahead map --initial_pres_fac 1.0 --router_profiler_astar_fac 1.5 --seed 3 --sdc_file ../vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/neuron_stratixiv_arch_timing.sdc --read_vpr_constraints half_blocks_half.xml --device neuron

A few things to note:

vtr-verilog-to-routing is the VTR root directory in these commands. Please use the architecture file I've attached to this email when running VPR. It has the device size specification that is needed for the --device command I've attached the three constraints files used in the tests to this email - sixteenth.xml, one_big_partition.xml, and half_blocks_half.xml " Archive.zip

zhaisitong commented 2 years ago

@MohamedElgammal I didn't find the architecture file in the "Archive.zip". I tried the stratixiv_arch.xml in the vtr_flow, but it doesn't work. Could you upload the architecture file again?

MohamedElgammal commented 2 years ago

@zhaisitong Kindly find attached the updated architecture file stratixiv_arch.timing.xml.zip

zhaisitong commented 2 years ago

@MohamedElgammal Thanks for your architecture file. Which branch and commit id could I use to reproduce this problem? Could I use the newest codes on the master branch?

vaughnbetz commented 2 years ago

@zhaisitong, some useful docs: https://docs.verilogtorouting.org/en/latest/vpr/placement_constraints/?highlight=placement%20constraints#vpr-placement-constraints Will also email Sarah's thesis.

vaughnbetz commented 2 years ago

@sfkhalid : can you point Sitong at a working branch or commit so he can see what the code should do?

zhaisitong commented 2 years ago

@MohamedElgammal When did you merge your API for sfkhalid's codes? Could the codes work after you merged your API? Could you specify me the commit id of your merge?

MohamedElgammal commented 2 years ago

@zhaisitong I didn't merge my API into Sarah's code. I did some refactoring for the original packing code and the first function of my API into master in PR #2034 . Then, Sarah merged these changes into her branch and pushed it into master. Finally, I have added the remaining parts of my API in follow-up PRs #2044 , #2051

zhaisitong commented 2 years ago

I traced back and found this problem comes from @jmah76 committed on Jun 8, the commit id is d22193d4c8030a24daa73884b60d9bdb8d0aec0b. I think @MohamedElgammal had pushed all the relevant code and worked at that time?

MohamedElgammal commented 2 years ago

@zhaisitong Yes, all the refactoring commits were merged into master on May 27

vaughnbetz commented 2 years ago

That commit is just a command line description update; it's not clear to me how that could cause a problem. Have you confirmed the code just before that passes the tests?

zhaisitong commented 2 years ago

@vaughnbetz Yes, it is weird. But I run 5 times, the results are the same. When I checked out to each commits, I found the codes are very different. I am still comparing them.

zhaisitong commented 2 years ago

I think this might be because @jmah76 forgot to update codes before pushing them. After I recover all relevant codes, the three commands mentioned above works.

Above is incorrect, this problem is because of other reasons!

zhaisitong commented 2 years ago

The problem might come from a14647570ac98df63b1871508a73496b64fbf31f. In function vpr/src/pack/cluster_util.cpp: get_highest_gain_molecule, before finding unpacked molecules based on attraction group of the current cluster, missing the condition "cur_pb->pb_stats->num_feasible_blocks == 0".

After adding this condition, the above three cases work.

zhaisitong commented 2 years ago

This issue has been solved with PR #2160 .

ArashAhmadian commented 2 years ago

@zhaisitong @vaughnbetz Thanks for following up on this :)