Wrong clock gate use in NV_NVDLA_CDMA_wt.v

ddkkevin commented 6 years ago

I extract trace and data from NN_L0_1_small_fbuf test for RTL simulation in nv_small environment and the extracted trace can be download from here. As data files are too large to upload, I modify trace file to only initiate data memory to zero and not load the actual data files. It will not affect the analysis of the problem.

The simulation command is "../verif/tools/run_test.py -P nv_small nn_mine -outdir nn_mine -wave -v nvdla_utb --nvdla_utb_work_mode RTL_ONLY". After Convolution 0 and SDP 1 operations have finished, the simulation is waiting for data and weight loading for Convolution 2. However, the data and weight loading operation can not be finished because wt_wr_dmatx_cnt in NV_NVDLA_CDMA_wt.v reached maximum number. In 1226 line of NV_NVDLA_CDMA_wt.v, wt_wr_dmatx_cnt use gated clock nvdla_core_clk and after weight loading of Convolution 0 finished, nvdla_core_clk is disabled. During this period, sc_wt_entries can not be added to wt_wr_dmatx_cnt which lead to wt_wr_dmatx_cnt is not enough for weight loading for Convolution 2.

Therefore, I think wt_wr_dmatx_cnt should use none-gated clock nvdla_core_ng_clk instead of gated clock nvdla_core_clk.

ddkkevin commented 6 years ago

After change clock of wt_wr_dmatx_cnt from nvdla_core_clk to nvdla_core_ng_clk, the simulation can continue running. Is it a bug for nv_small? However, it is strange why NN_L0_1_small_fbuf test can finish successfully in VP environment? Does C model not implement clock gate function?

cainiaowu commented 5 years ago

This is definitely a serious bug, however if you can keep CDMA busy all the time, like what the official KMD did, it wont even be visible. You need to reset the core when it not responding.

nvdla / hw

Wrong clock gate use in NV_NVDLA_CDMA_wt.v #183