Open dmukherj09 opened 1 year ago
Hi I think so too that the requests are getting splitted which is the reason for more number of requests to the DRAM simulator, but the prosperoCPU log says -- Split reads issed: 0 Split writes issued: 0
Maybe it is a bug, I'm not sure
Okay so I confirmed your logic, and seems like yes, there's a increase in the number of memory requests because of the splitting of the requests depending upon the request size, but I think there's some bug with prosperoCPU as it is showing split read and writes as 0
Could you describe how you confirmed that requests were being split? Prospero should always update the splitWritesIssued
or splitReadsIssued
variable anytime a request is split.
I verified that the requests were being split by doing an experiment by sweeping the request size and checking the resultant requests in the main memory model. Whenever the requests span over a cache line (64B), I'm getting an additional request at the memory model, although, the "splitWritesIssued" and "splitReadsIssued" are showing "0"
I don't think CramSim will split transactions, based on c_MemhBridge::createTxn()
, so I'm not sure where it is happening if Prospero isn't doing it.
Could you share information to help us reproduce the bug?
The SST-Core and the SST-Element version is 12.1.0
The prospero input is 49M memory requests from a file which has following pattern (it is spec2016 benchmark #500)-- Format is --
34965563945 R 33787268672 64 34965563964 W 33607538048 64 34965563986 R 33655373632 64 34965564004 R 33810133376 64 ......
The SDL File is --
import sst
from mhlib import componentlist
statFile = "stats_cramsim_big_latency.csv"
statLevel = 16
def read_arguments():
boolUseDefaultConfig = True
def setup_config_params():
l_params = {}
if g_boolUseDefaultConfig:
print("Config file not found... using default configuration")
l_params = {
"clockCycle": "1ns",
"numChannels":"1",
"clockCycle": "1ns",
"numChannels":"1",
"numRanksPerChannel":"2",
"numBankGroupsPerRank":"2",
"numBanksPerBankGroup":"2",
"numRowsPerBank":"32768",
"numColsPerBank":"2048",
"numBytesPerTransaction":"32",
"relCommandWidth":"1",
"readWriteRatio":"1",
"boolUseReadA":"0",
"boolUseWriteA":"0",
"boolUseRefresh":"0",
"boolAllocateCmdResACT":"0",
"boolAllocateCmdResREAD":"1",
"boolAllocateCmdResREADA":"1",
"boolAllocateCmdResWRITE":"1",
"boolAllocateCmdResWRITEA":"1",
"boolAllocateCmdResPRE":"0",
"boolCmdQueueFindAnyIssuable":"1",
"boolPrintCmdTrace":"0",
"strAddressMapStr":"_r_l_R_B_b_h_",
"bankPolicy":"CLOSE",
"nRC":"55",
"nRRD":"4",
"nRRD_L":"6",
"nRRD_S":"4",
"nRCD":"16",
"nCCD":"4",
"nCCD_L":"6",
"nCCD_L_WR":"1",
"nCCD_S":"4",
"nAL":"15",
"nCL":"16",
"nCWL":"12",
"nWR":"18",
"nWTR":"3",
"nWTR_L":"9",
"nWTR_S":"3",
"nRTW":"4",
"nEWTR":"6",
"nERTW":"6",
"nEWTW":"6",
"nERTR":"6",
"nRAS":"39",
"nRTP":"9",
"nRP":"16",
"nRFC":"420",
"nREFI":"9360",
"nFAW":"16",
"nBL":"4"
}
else:
l_configFile = open(g_config_file, 'r')
for l_line in l_configFile:
l_tokens = l_line.split(' ')
l_params[l_tokens[0]] = l_tokens[1]
return l_params
g_boolUseDefaultConfig = True
g_params = setup_config_params()
comp_cpu0 = sst.Component("cpu0", "prospero.prosperoCPU")
comp_cpu0.addParams({
"verbose" : "1",
"reader" : "prospero.ProsperoTextTraceReader",
"readerParams.file" : "trace.txt",
"clock" : "3GHz"
})
comp_l1cache0 = sst.Component("l1cache0", "memHierarchy.Cache")
comp_l1cache0.addParams({
"access_latency_cycles" : "11",
"cache_frequency" : "3GHz",
"replacement_policy" : "lru",
"coherence_protocol" : "MESI",
"associativity" : "1",
"cache_line_size" : "64",
"L1" : "1",
"cache_size" : "64B",
"verbose" : 1
})
bus = sst.Component("bus", "memHierarchy.Bus")
bus.addParams({
"bus_frequency" : "3GHz",
"verbose" : 1
})
comp_l2cache = sst.Component("l2cache", "memHierarchy.Cache")
comp_l2cache.addParams({
"access_latency_cycles" : "20",
"cache_frequency" : "3 Ghz",
"replacement_policy" : "lru",
"coherence_protocol" : "MESI",
"associativity" : "1",
"cache_line_size" : "64",
"cache_size" : "64B",
"verbose" : 1
})
comp_l3cache = sst.Component("l3cache", "memHierarchy.Cache")
comp_l3cache.addParams({
"access_latency_cycles" : "100",
"cache_frequency" : "3 Ghz",
"replacement_policy" : "lru",
"coherence_protocol" : "MESI",
"associativity" : "1",
"cache_line_size" : "64",
"cache_size" : "64B",
"verbose" : 1
})
comp_memctrl = sst.Component("memory", "memHierarchy.MemController")
comp_memctrl.addParams({
"verbose" : "1",
"clock" : "3GHz",
"addr_range_end" : 64*1024*1024*1024-1,
})
comp_memory = comp_memctrl.setSubComponent("backend", "memHierarchy.cramsim")
comp_memory.addParams({
"access_time" : "2 ns", # Phy latency
"mem_size" : "64GiB",
"verbose" : 1
})
comp_memhBridge = sst.Component("memh_bridge", "CramSim.c_MemhBridge")
comp_memhBridge.addParams(g_params);
comp_memhBridge.addParams({
"verbose" : "1",
"numTxnPerCycle" : g_params["numChannels"],
})
comp_controller0 = sst.Component("MemController0", "CramSim.c_Controller")
comp_controller0.addParams(g_params)
comp_controller0.addParams({
"verbose" : "1",
"TxnConverter" : "CramSim.c_TxnConverter",
"AddrHasher" : "CramSim.c_AddressHasher",
"CmdScheduler" : "CramSim.c_CmdScheduler" ,
"DeviceController" : "CramSim.c_DeviceController"
})
comp_dimm0 = sst.Component("Dimm0", "CramSim.c_Dimm")
comp_dimm0.addParams(g_params)
sst.setStatisticLoadLevel(statLevel)
sst.enableAllStatisticsForAllComponents({"type":"sst.AccumulatorStatistic"})
sst.setStatisticOutput("sst.statOutputCSV")
sst.setStatisticOutputOptions( {
"filepath" : statFile,
"separator" : ", "
} )
link_cpu0_l1cache0 = sst.Link("link_cpu0_l1cache0")
link_cpu0_l1cache0.connect( (comp_cpu0, "cache_link", "1ps"), (comp_l1cache0, "high_network_0", "1ps") )
ink_l10_bus0 = sst.Link("link_l10_bus0")
link_l10_bus0.connect((comp_l1cache0, "low_network_0", "1ps"), (bus, "high_network_0", "1ps"))
link_bus_l2 = sst.Link("link_bus_l2")
link_bus_l2.connect( (bus, "low_network_0", "1ps"), (comp_l2cache, "high_network_0", "1ps") )
link_l2_l3 = sst.Link("link_l2_l3")
link_l2_l3.connect((comp_l2cache, "low_network_0", "1ps"), (comp_l3cache, "high_network_0", "1ps"))
link_l3_mem = sst.Link("link_l3_mem")
link_l3_mem.connect( (comp_l3cache, "low_network_0", "1ps"), (comp_memctrl, "direct_link", "1ps") )
link_dir_cramsim_link = sst.Link("link_dir_cramsim_link")
link_dir_cramsim_link.connect( (comp_memory, "cramsim_link", "2ps"), (comp_memhBridge, "cpuLink", "2ps")
memHLink = sst.Link("memHLink_1")
memHLink.connect( (comp_memhBridge, "memLink", g_params["clockCycle"]), (comp_controller0, "txngenLink", g_params["clockCycle"]) )
cmdLink = sst.Link("cmdLink_1")
cmdLink.connect( (comp_controller0, "memLink", g_params["clockCycle"]), (comp_dimm0, "ctrlLink", g_params["clockCycle"]) )
The above SDL is file is mainly following the testBackendCramsim in the memHierarchy/tests/ directory and replacing the default CPU model with a prospero model
Could you share the sdl file again but in a code block?
Surround your code with ``` on either side so that the indentation is preserved. If you hit the preview button it should look like this:
import sst
import numpy
...
Or, if you have the sdl file in a repo please just share a link to the file.
Hi @plavin I have updated the code above
I've got your code running now. Are you able to share some of the trace? Not the whole thing, but just enough so I can observe the issues you're seeing here and in #2221.
Hi,
I'm trying to interface a ProsperoCPU with the CramSim DRAM simulator with some caches in the hierarchy. I'm providing a trace file to the prosperoCPU and the program is running fine without errors and is also dumping the stats.
ISSUE ----
There is a param in the "CramSIm" called "boolPrintTxnTrace" to dump the trace of the transactions.
1) The thing is when I try to tally the addresses of the memory requests in this file with the original trace file I provided to the prosperoCPU, the addresses don't match. 2) Since this trace is dumped by the DRAM simulator, the number of requests should be less than the number of requests originally in the trace file given to the prosperoCPU as some of the requests will be fulfilled by the caches, but that is not the case here and this file I see there are even more number of memory requests compared to the orginal trace file.
Please help me in resolving this issue.