Closed GoogleCodeExporter closed 9 years ago
I'm not able to reproduce; reaches the end without issues. If I had to venture
a guess, you may have overwritten the .net file with another .net file over the
course of the experiment run.
Original comment by JasonKai...@gmail.com
on 4 Mar 2013 at 6:36
No, I'm quite sure it is not a result of the .net file being overwritten during
run.
This error is uncommon, but seems to be present on the current trunk when using
the new arch file on the mcml circuit (both attached).
Somehow I am ending up with a corrupted net name in the .net file. The error
is:
ERROR(1): .net file and .blif file do not match, encountered unknown primitive
top.PhotonCalculator+u_calc.Boundary+boundaryChecker.signed_div_30+divide_u1.Div
_64b+div_replace.Div_64b_unsigned+div_temp^FF_NODE~3ÊòIñá in .net file.
You can see the offending net name with the characters 'ÊòIñá', which are
present in the produced net file.
This is occurring on my cygwin-compiled local machine, as well as on the UBC
cluster (running linux). If I run the benchmark on my VM linux install it seg
faults.
The error occurs after packing, when the net is verified against the blif. I am
running valgrind, but it will take a few hours I think, as mcml takes a while
to pack.
Original comment by jeffrey....@gmail.com
on 5 Apr 2013 at 9:29
Attachments:
[deleted comment]
The valgrind output, which seems to indicate an out of bounds write at this
line of code:
/* This will stop the IPIN node used to get to this SINK from being *
* reexpanded for the remainder of this net's routing. This will make us *
* hook up more IPINs to this SINK (which is what we want). If IPIN *
* doglegs are allowed in the graph, we won't be able to use this IPIN to *
* do a dogleg, since it won't be re-expanded. Shouldn't be a big problem. */
rr_node_route_inf[last_ipin_node].path_cost = -HUGE_POSITIVE_FLOAT;
Any idea what would be causing this Jason?
jeff@ubuntu:~/Dropbox/linux_home/temp$ valgrind ../vtr/vpr/vpr
k6_frac_N10_mem32K_40nm.xml mcml --blif_file mcml.pre-vpr.blif
--timing_analysis on --timing_driven_clustering on --cluster_seed_type timing
--seed 1 --nodisp > valgrind_mcml.out
==6211== Memcheck, a memory error detector
==6211== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==6211== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==6211== Command: ../vtr/vpr/vpr k6_frac_N10_mem32K_40nm.xml mcml --blif_file
mcml.pre-vpr.blif --timing_analysis on --timing_driven_clustering on
--cluster_seed_type timing --seed 1 --nodisp
==6211==
==6211== Invalid write of size 4
==6211== at 0x40E0AD: breadth_first_expand_trace_segment_cluster(s_trace*,
int) (cluster_legality.c:834)
==6211== by 0x40DD0D: breadth_first_route_net_cluster(int)
(cluster_legality.c:717)
==6211== by 0x40DB44: try_breadth_first_route_cluster()
(cluster_legality.c:652)
==6211== by 0x484F28: do_clustering(s_arch const*, s_pack_molecule*, int,
boolean, boolean*, boolean, char*, boolean, e_cluster_seed, float, float, int,
float, float, float, float, boolean, boolean, boolean, e_packer_algorithm,
s_timing_inf) (cluster.c:530)
==6211== by 0x412020: try_pack(s_packer_opts*, s_arch const*, s_model*,
s_model*, s_timing_inf, float) (pack.c:82)
==6211== by 0x4057EB: vpr_pack(s_vpr_setup, s_arch) (vpr_api.c:426)
==6211== by 0x4028C5: main (main.c:46)
==6211== Address 0x4a20c454 is 12 bytes before a block of size 25,392 alloc'd
==6211== at 0x4C2B6CD: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6211== by 0x402F35: my_malloc (util.c:153)
==6211== by 0x42BCBD: alloc_and_load_rr_node_route_structs()
(route_common.c:791)
==6211== by 0x40D861: alloc_and_load_legalizer_for_cluster(s_block*, int,
s_arch const*) (cluster_legality.c:578)
==6211== by 0x489519: start_new_cluster(s_cluster_placement_stats*,
s_pb_graph_node**, s_arch const*, s_block*, int, s_pack_molecule*, float, int*,
int*, int, int, int, int) (cluster.c:1926)
==6211== by 0x484B63: do_clustering(s_arch const*, s_pack_molecule*, int,
boolean, boolean*, boolean, char*, boolean, e_cluster_seed, float, float, int,
float, float, float, float, boolean, boolean, boolean, e_packer_algorithm,
s_timing_inf) (cluster.c:447)
==6211== by 0x412020: try_pack(s_packer_opts*, s_arch const*, s_model*,
s_model*, s_timing_inf, float) (pack.c:82)
==6211== by 0x4057EB: vpr_pack(s_vpr_setup, s_arch) (vpr_api.c:426)
==6211== by 0x4028C5: main (main.c:46)
==6211==
Original comment by jeffrey....@gmail.com
on 6 Apr 2013 at 11:37
Yup, I can reproduce, thanks! Working on it.
Original comment by JasonKai...@gmail.com
on 6 Apr 2013 at 11:58
Wow, talk about an elusive bug. If a net connects to the same LUT multiple
times (technically still a correct netlist but ABC almost always optimizes this
case away), then VPR would corrupt memory in a subtle way that doesn't always
show. I've fixed this bug now and am rerunning all our experiments to check.
Original comment by JasonKai...@gmail.com
on 8 Apr 2013 at 4:30
Original issue reported on code.google.com by
jeffrey....@gmail.com
on 28 Feb 2013 at 11:49Attachments: