ominux / vtr-verilog-to-routing

Automatically exported from code.google.com/p/vtr-verilog-to-routing
0 stars 0 forks source link

.net file and .blif file do not match #57

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
k6_N10_I33_Fi6_L4_frac1_ff1_45nm.xml bgm --blif_file bgm.pre-vpr.blif 
--timing_analysis on --timing_driven_clustering on --cluster_seed_type timing 
--seed 1 --nodisp

What is the expected output? What do you see instead?
ERROR(1): .net file and .blif file do not match, encountered unknown primitive 
in .net file.

This is a custom architecture file I am using - but it works for all other 
circuits, so I don't think it is the problem.

Original issue reported on code.google.com by jeffrey....@gmail.com on 28 Feb 2013 at 11:49

Attachments:

GoogleCodeExporter commented 9 years ago
I'm not able to reproduce; reaches the end without issues.  If I had to venture 
a guess, you may have overwritten the .net file with another .net file over the 
course of the experiment run.

Original comment by JasonKai...@gmail.com on 4 Mar 2013 at 6:36

GoogleCodeExporter commented 9 years ago
No, I'm quite sure it is not a result of the .net file being overwritten during 
run.

This error is uncommon, but seems to be present on the current trunk when using 
the new arch file on the mcml circuit (both attached).  

Somehow I am ending up with a corrupted net name in the .net file.  The error 
is:
ERROR(1): .net file and .blif file do not match, encountered unknown primitive 
top.PhotonCalculator+u_calc.Boundary+boundaryChecker.signed_div_30+divide_u1.Div
_64b+div_replace.Div_64b_unsigned+div_temp^FF_NODE~3ÊòIñá in .net file.

You can see the offending net name with the characters 'ÊòIñá', which are 
present in the produced net file.  

This is occurring on my cygwin-compiled local machine, as well as on the UBC 
cluster (running linux).  If I run the benchmark on my VM linux install it seg 
faults.

The error occurs after packing, when the net is verified against the blif. I am 
running valgrind, but it will take a few hours I think, as mcml takes a while 
to pack.

Original comment by jeffrey....@gmail.com on 5 Apr 2013 at 9:29

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
The valgrind output, which seems to indicate an out of bounds write at this 
line of code:

                /* This will stop the IPIN node used to get to this SINK from being         *
                 * reexpanded for the remainder of this net's routing.  This will make us   *
                 * hook up more IPINs to this SINK (which is what we want).  If IPIN        *
                 * doglegs are allowed in the graph, we won't be able to use this IPIN to   *
                 * do a dogleg, since it won't be re-expanded.  Shouldn't be a big problem. */

                rr_node_route_inf[last_ipin_node].path_cost = -HUGE_POSITIVE_FLOAT;

Any idea what would be causing this Jason?

jeff@ubuntu:~/Dropbox/linux_home/temp$ valgrind ../vtr/vpr/vpr 
k6_frac_N10_mem32K_40nm.xml mcml --blif_file mcml.pre-vpr.blif 
--timing_analysis on --timing_driven_clustering on --cluster_seed_type timing 
--seed 1 --nodisp > valgrind_mcml.out
==6211== Memcheck, a memory error detector
==6211== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==6211== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==6211== Command: ../vtr/vpr/vpr k6_frac_N10_mem32K_40nm.xml mcml --blif_file 
mcml.pre-vpr.blif --timing_analysis on --timing_driven_clustering on 
--cluster_seed_type timing --seed 1 --nodisp
==6211== 
==6211== Invalid write of size 4
==6211==    at 0x40E0AD: breadth_first_expand_trace_segment_cluster(s_trace*, 
int) (cluster_legality.c:834)
==6211==    by 0x40DD0D: breadth_first_route_net_cluster(int) 
(cluster_legality.c:717)
==6211==    by 0x40DB44: try_breadth_first_route_cluster() 
(cluster_legality.c:652)
==6211==    by 0x484F28: do_clustering(s_arch const*, s_pack_molecule*, int, 
boolean, boolean*, boolean, char*, boolean, e_cluster_seed, float, float, int, 
float, float, float, float, boolean, boolean, boolean, e_packer_algorithm, 
s_timing_inf) (cluster.c:530)
==6211==    by 0x412020: try_pack(s_packer_opts*, s_arch const*, s_model*, 
s_model*, s_timing_inf, float) (pack.c:82)
==6211==    by 0x4057EB: vpr_pack(s_vpr_setup, s_arch) (vpr_api.c:426)
==6211==    by 0x4028C5: main (main.c:46)
==6211==  Address 0x4a20c454 is 12 bytes before a block of size 25,392 alloc'd
==6211==    at 0x4C2B6CD: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6211==    by 0x402F35: my_malloc (util.c:153)
==6211==    by 0x42BCBD: alloc_and_load_rr_node_route_structs() 
(route_common.c:791)
==6211==    by 0x40D861: alloc_and_load_legalizer_for_cluster(s_block*, int, 
s_arch const*) (cluster_legality.c:578)
==6211==    by 0x489519: start_new_cluster(s_cluster_placement_stats*, 
s_pb_graph_node**, s_arch const*, s_block*, int, s_pack_molecule*, float, int*, 
int*, int, int, int, int) (cluster.c:1926)
==6211==    by 0x484B63: do_clustering(s_arch const*, s_pack_molecule*, int, 
boolean, boolean*, boolean, char*, boolean, e_cluster_seed, float, float, int, 
float, float, float, float, boolean, boolean, boolean, e_packer_algorithm, 
s_timing_inf) (cluster.c:447)
==6211==    by 0x412020: try_pack(s_packer_opts*, s_arch const*, s_model*, 
s_model*, s_timing_inf, float) (pack.c:82)
==6211==    by 0x4057EB: vpr_pack(s_vpr_setup, s_arch) (vpr_api.c:426)
==6211==    by 0x4028C5: main (main.c:46)
==6211== 

Original comment by jeffrey....@gmail.com on 6 Apr 2013 at 11:37

GoogleCodeExporter commented 9 years ago
Yup, I can reproduce, thanks!  Working on it.

Original comment by JasonKai...@gmail.com on 6 Apr 2013 at 11:58

GoogleCodeExporter commented 9 years ago
Wow, talk about an elusive bug.  If a net connects to the same LUT multiple 
times (technically still a correct netlist but ABC almost always optimizes this 
case away), then VPR would corrupt memory in a subtle way that doesn't always 
show.  I've fixed this bug now and am rerunning all our experiments to check.

Original comment by JasonKai...@gmail.com on 8 Apr 2013 at 4:30