GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Other
1.13k
stars
511
forks
source link
microarchitecture model bug, ldst pop m_accessq banking error #268
shader.cc line 1875(around), in ldst_unit::process_memory_access_queue_l1cache.
when poping out from inst.m_accessq, a function named m_config->m_L1D_config.set_bank is used.
Assume sector size to be 32B, 4 sectors a line, I suppose the variable bank_id is going to slice out the 6-7 bits(from right to left) from the total 32 bits addr, as descripted in setbank function(its subcall) gpu-cache.cc line 133:
so the variable m_line_sz_log2 should be the log2 of 32 = 5, but in a real execution this value is 32 instead, causing the right shift always make the set_index to be 0, which in turn cause the bank conflict detection mechanism always result in BK_CONF.
Here is my GDB execution result:
(gdb) n
1875 unsigned bank_id = m_config->m_L1D_config.set_bank(mf->get_addr());
(gdb) s
mem_fetch::get_addr (this=0x7ffff08b0750) at mem_fetch.h:89
89 new_addr_type get_addr() const { return m_access.get_addr(); }
(gdb) fin
Run till exit from #0 mem_fetch::get_addr (this=0x7ffff08b0750) at mem_fetch.h:89
0x00007ffff7c0053c in ldst_unit::process_memory_access_queue_l1cache (this=0x55555644e380, cache=0x5555564eb380, inst=...) at shader.cc:1875
1875 unsigned bank_id = m_config->m_L1D_config.set_bank(mf->get_addr());
Value returned is $8 = 3221225920
(gdb) s
l1d_cache_config::set_bank (this=0x555555575860, addr=3221225920) at gpu-cache.cc:66
66 return cache_config::hash_function(addr, l1_banks, l1_banks_byte_interleaving,//目前interleaving值均为32B
(gdb) n
68 l1_banks_hashing_function);
(gdb) s
67 m_l1_banks_log2,
(gdb) s
66 return cache_config::hash_function(addr, l1_banks, l1_banks_byte_interleaving,//目前interleaving值均为32B
(gdb) s
68 l1_banks_hashing_function);
(gdb) s
cache_config::hash_function (this=0x555555575860, addr=3221225920, m_nset=4, m_line_sz_log2=32, m_nset_log2=2, m_index_function=0) at gpu-cache.cc:80
80 unsigned set_index = 0;
(gdb) n
82 switch (m_index_function) {
(gdb) n
133 set_index = (addr >> m_line_sz_log2) & (m_nset - 1);
(gdb) n
134 break;
(gdb) p set_index
$9 = 0
value 3221225920 is 11000000000000000000000111100000 in binary , its 6-7 bits it 11, not the shown result 00
shader.cc line 1875(around), in
ldst_unit::process_memory_access_queue_l1cache
.when poping out from
inst.m_accessq
, a function namedm_config->m_L1D_config.set_bank
is used.Assume sector size to be 32B, 4 sectors a line, I suppose the variable
bank_id
is going to slice out the 6-7 bits(from right to left) from the total 32 bitsaddr
, as descripted insetbank
function(its subcall) gpu-cache.cc line 133:so the variable
m_line_sz_log2
should be the log2 of 32 = 5, but in a real execution this value is 32 instead, causing the right shift always make theset_index
to be 0, which in turn cause the bank conflict detection mechanism always result in BK_CONF.Here is my GDB execution result:
value 3221225920 is
11000000000000000000000111100000
in binary , its 6-7 bits it11
, not the shown result00