Closed jeremychritz closed 6 years ago
Oop, sorry for the delay on this. That's kind of surprising. Hmm. Do any of the other tests pass (for example, the other sanity tests)? The tests pass last I ran them here, and we haven't changed anything recently...
@jwise Thanks for getting back!
The other test that fails is sanity3_cvsram, I get a similar error:
542530.00ns MSEQ: Read mismatched, retrying (addr: 0xffff0003, received data: 0x00140001, expected data: 0xffffffff.
542610.00ns MSEQ: Retrying read at 0xffff0003 (returned 0x00140001, expected 0xffffffff)...
544890.00ns MSEQ: ERROR: Read to address 0xffff0003 did not return the expected value 0xffffffff after 50000 attempts (last read 0x00140001), failing.
This is the command I ran:
make run TESTDIR=../traces/traceplayer/sanity3_cvsram/ TESTNAME=sanity3_cvsram
All other tests passed just the googlenet and the sanity3_cvsram fail.
Just as a sanity this is the git branch info:
* nvdlav1 2a2c725 Fix a URL typo.
This is the VCS version:
Compiler version M-2017.03-SP2-3_Full64; Runtime version M-2017.03-SP2-3_Full64; Jan 22 21:50 2018
VCD+ Writer M-2017.03-SP2-3_Full64 Copyright
I also get the same results with Xilinx's 2017.1 Vivado Simulator, I don't think it is a simulator version?
I'm also diving into the RTL to see if I can better diagnose the problem will report back.
That's very weird. This goes along with:
read_reg 0xffff0003 0x0 0xffffffff #NVDLA_GLB.S_INTR_STATUS_0clear interrupts
Which is trying to read any value from 0xffff0003
, which is the interrupt status register -- the 0x0
there means that the mask is 0 (anything should be accepted), and the 0xffffffff
is a dummy. So the first read should always succeed. It's sounding to me like the Perl script that compiles an input.txn
into a synthtb memory dump is confused somehow, and is not properly parsing the input.txn
. Maybe the best place to look is to instrument verif/synth_tb/sim_scripts/inp_txn_to_hexdump.pl
, and see if it's correctly parsing that (and, if not, why not)?
Just for clarification:
input.txn.raw
file? input.txn.raw
from your run so I can do a diff?Yes, input.txn.raw
is what I refer to. Here's an input.txn.raw.txt from sanity3_cvsram, generated with something like:
jwise@jwise-dt:~/nvdla/hw/verif/traces/traceplayer$ ../../synth_tb/sim_scripts/inp_txn_to_hexdump.pl sanity3_cvsram
Yes, inp_txn_to_hexdump.pl
would be the way to go.
Thanks! However, I'm not seeing any difference, here is my input.txn.raw.jc.txt. Files are identical.
For sanity3_cvsram, I'm still seeing this error:
542650.00ns MSEQ: Retrying read at 0xffff0003 (returned 0x00140001, expected 0xffffffff)...
544930.00ns MSEQ: ERROR: Read to address 0xffff0003 did not return the expected value 0xffffffff after 50000 attempts (last read 0x00140001), failing.
I did trace the 0x00140001
back to NV_NVDLA_csb_master.v, line 2234:
assign core_resp_pd = ({34 {afbif_resp_pvld}} & afbif_resp_pd)
I added some debug code to capture that value:
//JBC Debug
always @(posedge nvdla_core_clk) begin
if ( core_resp_pvld ) begin
$display("%0t {34 {afbif_resp_pvld}} & afbif_resp_pd) : 0x%08x", $time, ({34 {afbif_resp_pvld}} & afbif_resp_pd) );
$display("%0t {34 {cmac_a_resp_valid}} & cmac_a_resp_pd) : 0x%08x", $time, ({34 {cmac_a_resp_valid}} & cmac_a_resp_pd) );
$display("%0t {34 {sdp_rdma_resp_valid}} & sdp_rdma_resp_pd): 0x%08x", $time, ({34 {sdp_rdma_resp_valid}} & sdp_rdma_resp_pd));
$display("%0t {34 {csc_resp_valid}} & csc_resp_pd) : 0x%08x", $time, ({34 {csc_resp_valid}} & csc_resp_pd) );
$display("%0t {34 {gec_resp_valid}} & gec_resp_pd) : 0x%08x", $time, ({34 {gec_resp_valid}} & gec_resp_pd) );
$display("%0t {34 {cdp_resp_valid}} & cdp_resp_pd) : 0x%08x", $time, ({34 {cdp_resp_valid}} & cdp_resp_pd) );
$display("%0t {34 {cacc_resp_valid}} & cacc_resp_pd) : 0x%08x", $time, ({34 {cacc_resp_valid}} & cacc_resp_pd) );
$display("%0t {34 {glb_resp_valid}} & glb_resp_pd) : 0x%08x", $time, ({34 {glb_resp_valid}} & glb_resp_pd) );
$display("%0t {34 {cvif_resp_valid}} & cvif_resp_pd) : 0x%08x", $time, ({34 {cvif_resp_valid}} & cvif_resp_pd) );
$display("%0t {34 {cmac_b_resp_valid}} & cmac_b_resp_pd) : 0x%08x", $time, ({34 {cmac_b_resp_valid}} & cmac_b_resp_pd) );
$display("%0t {34 {pdp_resp_valid}} & pdp_resp_pd) : 0x%08x", $time, ({34 {pdp_resp_valid}} & pdp_resp_pd) );
$display("%0t {34 {cdma_resp_valid}} & cdma_resp_pd) : 0x%08x", $time, ({34 {cdma_resp_valid}} & cdma_resp_pd) );
$display("%0t {34 {sdp_resp_valid}} & sdp_resp_pd) : 0x%08x", $time, ({34 {sdp_resp_valid}} & sdp_resp_pd) );
$display("%0t {34 {bdma_resp_valid}} & bdma_resp_pd) : 0x%08x", $time, ({34 {bdma_resp_valid}} & bdma_resp_pd) );
$display("%0t {34 {pdp_rdma_resp_valid}} & pdp_rdma_resp_pd): 0x%08x", $time, ({34 {pdp_rdma_resp_valid}} & pdp_rdma_resp_pd));
$display("%0t {34 {mcif_resp_valid}} & mcif_resp_pd) : 0x%08x", $time, ({34 {mcif_resp_valid}} & mcif_resp_pd) );
$display("%0t {34 {cdp_rdma_resp_valid}} & cdp_rdma_resp_pd): 0x%08x", $time, ({34 {cdp_rdma_resp_valid}} & cdp_rdma_resp_pd));
$display("%0t {34 {rbk_resp_valid}} & rbk_resp_pd) : 0x%08x", $time, ({34 {rbk_resp_valid}} & rbk_resp_pd) );
$display("%0t {34 {dummy_resp_valid}} & dummy_resp_pd); : 0x%08x", $time, ({34 {dummy_resp_valid}} & dummy_resp_pd) );
end
end
Here is the test.log from that run.
At the end of that log file:
542290.00ns {34 {afbif_resp_pvld}} & afbif_resp_pd) : 0x00000000
542290.00ns {34 {cmac_a_resp_valid}} & cmac_a_resp_pd) : 0x00000000
542290.00ns {34 {sdp_rdma_resp_valid}} & sdp_rdma_resp_pd): 0x00000000
542290.00ns {34 {csc_resp_valid}} & csc_resp_pd) : 0x00000000
542290.00ns {34 {gec_resp_valid}} & gec_resp_pd) : 0x00000000
542290.00ns {34 {cdp_resp_valid}} & cdp_resp_pd) : 0x00000000
542290.00ns {34 {cacc_resp_valid}} & cacc_resp_pd) : 0x00000000
542290.00ns {34 {glb_resp_valid}} & glb_resp_pd) : 0x00140001
542290.00ns {34 {cvif_resp_valid}} & cvif_resp_pd) : 0x00000000
542290.00ns {34 {cmac_b_resp_valid}} & cmac_b_resp_pd) : 0x00000000
542290.00ns {34 {pdp_resp_valid}} & pdp_resp_pd) : 0x00000000
542290.00ns {34 {cdma_resp_valid}} & cdma_resp_pd) : 0x00000000
542290.00ns {34 {sdp_resp_valid}} & sdp_resp_pd) : 0x00000000
542290.00ns {34 {bdma_resp_valid}} & bdma_resp_pd) : 0x00000000
542290.00ns {34 {pdp_rdma_resp_valid}} & pdp_rdma_resp_pd): 0x00000000
542290.00ns {34 {mcif_resp_valid}} & mcif_resp_pd) : 0x00000000
542290.00ns {34 {cdp_rdma_resp_valid}} & cdp_rdma_resp_pd): 0x00000000
542290.00ns {34 {rbk_resp_valid}} & rbk_resp_pd) : 0x00000000
542290.00ns {34 {dummy_resp_valid}} & dummy_resp_pd); : 0x00000000
542570.00ns MSEQ: Read mismatched, retrying (addr: 0xffff0003, received data: 0x00140001, expected data: 0xffffffff.
542650.00ns MSEQ: Retrying read at 0xffff0003 (returned 0x00140001, expected 0xffffffff)...
544930.00ns MSEQ: ERROR: Read to address 0xffff0003 did not return the expected value 0xffffffff after 50000 attempts (last read 0x00140001), failing.
545410.00ns {34 {afbif_resp_pvld}} & afbif_resp_pd) : 0x00000000
545410.00ns {34 {cmac_a_resp_valid}} & cmac_a_resp_pd) : 0x00000000
545410.00ns {34 {sdp_rdma_resp_valid}} & sdp_rdma_resp_pd): 0x00000000
545410.00ns {34 {csc_resp_valid}} & csc_resp_pd) : 0x00000000
545410.00ns {34 {gec_resp_valid}} & gec_resp_pd) : 0x00000000
545410.00ns {34 {cdp_resp_valid}} & cdp_resp_pd) : 0x00000000
545410.00ns {34 {cacc_resp_valid}} & cacc_resp_pd) : 0x00000000
545410.00ns {34 {glb_resp_valid}} & glb_resp_pd) : 0x00000000
545410.00ns {34 {cvif_resp_valid}} & cvif_resp_pd) : 0x00000000
545410.00ns {34 {cmac_b_resp_valid}} & cmac_b_resp_pd) : 0x00000000
545410.00ns {34 {pdp_resp_valid}} & pdp_resp_pd) : 0x00000000
545410.00ns {34 {cdma_resp_valid}} & cdma_resp_pd) : 0x00000000
545410.00ns {34 {sdp_resp_valid}} & sdp_resp_pd) : 0x00000000
545410.00ns {34 {bdma_resp_valid}} & bdma_resp_pd) : 0x00000000
545410.00ns {34 {pdp_rdma_resp_valid}} & pdp_rdma_resp_pd): 0x00000000
545410.00ns {34 {mcif_resp_valid}} & mcif_resp_pd) : 0x00000000
545410.00ns {34 {cdp_rdma_resp_valid}} & cdp_rdma_resp_pd): 0x00000000
545410.00ns {34 {rbk_resp_valid}} & rbk_resp_pd) : 0x00000000
545410.00ns {34 {dummy_resp_valid}} & dummy_resp_pd); : 0x00000000
When the data mismatch is display:
542290.00ns {34 {glb_resp_valid}} & glb_resp_pd) : 0x00140001
After 50000 attempts then:
545410.00ns {34 {glb_resp_valid}} & glb_resp_pd) : 0x00000000
I think the 0x00000000
is the correct value, but regardless it does change after it failed.
Does this shed any light?
Hmm, ok, hmmm. Um, @zdraw, got any ideas?
Seeing the same issue. 0x00000000 is the correct value and 0xFFFFFFFF is supposed to be the mask so that error message is wrong. Or at least points to the issue. If you look in verif/synth_tb/sim_scripts/inp_txn_to_hexdump.pl and find where we parse for "read_reg", the order of the arguments is $address = $values[1]; $bitmask = $values[2]; $cmp_mode = "=="; $exp_data = $values[3];
So bitmask (0xffffffff) should be second and exp_data (0x0) third. In sanity3_cvsram and googlenet, these got flipped.
Ok, man is this embarrassing. I apparently screwed this up when I made the Verilator changes.
Commit a8fcfc3 fixes this. @jeremychritz , can you verify that that fixes it for you, too?
Yes, it works here are the log files from googlnet and sanity3_cvsram:
Anything else or can we close this issue?
Looks good to me. I'll close it. Sorry for the mess!
I've been able to run the other samples, but the Googlenet is failing:
Here is the full log: test.log
Here is the command I'm using to run the simulation:
Here is what I have set in my tree.make for project:
Any ideas on what is going on?