sstsimulator / sst-elements

SST Architectural Simulation Components and Libraries
http://www.sst-simulator.org
Other
92 stars 121 forks source link

Discrepancy between the address of the memory trace request #2190

Open dmukherj09 opened 1 year ago

dmukherj09 commented 1 year ago

Hi,

I'm trying to interface a ProsperoCPU with the CramSim DRAM simulator with some caches in the hierarchy. I'm providing a trace file to the prosperoCPU and the program is running fine without errors and is also dumping the stats.

ISSUE ----

There is a param in the "CramSIm" called "boolPrintTxnTrace" to dump the trace of the transactions.

1) The thing is when I try to tally the addresses of the memory requests in this file with the original trace file I provided to the prosperoCPU, the addresses don't match. 2) Since this trace is dumped by the DRAM simulator, the number of requests should be less than the number of requests originally in the trace file given to the prosperoCPU as some of the requests will be fulfilled by the caches, but that is not the case here and this file I see there are even more number of memory requests compared to the orginal trace file.

Please help me in resolving this issue.

plavin commented 1 year ago
  1. Addresses will be translated from virtual addresses to physical addresses by a ProsperoMemoryManager (see procscpu.cc:287 and prosmemmgr.cc), which is likely what causes the addresses in the CramSim trace to differ from your input trace.
  2. Prospero will split transactions that span cache lines (see proscpu.cc:274). Perhaps this is causing the extra accesses? Check the output for the lines that start with "Split reads issued" and "Split writes issued".
dmukherj09 commented 1 year ago

Hi I think so too that the requests are getting splitted which is the reason for more number of requests to the DRAM simulator, but the prosperoCPU log says -- Split reads issed: 0 Split writes issued: 0

Maybe it is a bug, I'm not sure

dmukherj09 commented 1 year ago

Okay so I confirmed your logic, and seems like yes, there's a increase in the number of memory requests because of the splitting of the requests depending upon the request size, but I think there's some bug with prosperoCPU as it is showing split read and writes as 0

plavin commented 1 year ago

Could you describe how you confirmed that requests were being split? Prospero should always update the splitWritesIssued or splitReadsIssued variable anytime a request is split.

dmukherj09 commented 1 year ago

I verified that the requests were being split by doing an experiment by sweeping the request size and checking the resultant requests in the main memory model. Whenever the requests span over a cache line (64B), I'm getting an additional request at the memory model, although, the "splitWritesIssued" and "splitReadsIssued" are showing "0"

plavin commented 1 year ago

I don't think CramSim will split transactions, based on c_MemhBridge::createTxn(), so I'm not sure where it is happening if Prospero isn't doing it.

Could you share information to help us reproduce the bug?

dmukherj09 commented 1 year ago

The SST-Core and the SST-Element version is 12.1.0

The prospero input is 49M memory requests from a file which has following pattern (it is spec2016 benchmark #500)-- Format is --

34965563945 R 33787268672 64 34965563964 W 33607538048 64 34965563986 R 33655373632 64 34965564004 R 33810133376 64 ......

dmukherj09 commented 1 year ago

The SDL File is --

import sst
from mhlib import componentlist
statFile = "stats_cramsim_big_latency.csv"
statLevel = 16

def read_arguments():
    boolUseDefaultConfig = True

def setup_config_params():
    l_params = {}
    if g_boolUseDefaultConfig:
        print("Config file not found... using default configuration")
        l_params = {
            "clockCycle": "1ns",
            "numChannels":"1",
            "clockCycle": "1ns",
            "numChannels":"1",
            "numRanksPerChannel":"2",
            "numBankGroupsPerRank":"2",
            "numBanksPerBankGroup":"2",
            "numRowsPerBank":"32768",
            "numColsPerBank":"2048",
            "numBytesPerTransaction":"32",
            "relCommandWidth":"1",
            "readWriteRatio":"1",
            "boolUseReadA":"0",
            "boolUseWriteA":"0",
            "boolUseRefresh":"0",
            "boolAllocateCmdResACT":"0",
            "boolAllocateCmdResREAD":"1",
            "boolAllocateCmdResREADA":"1",
            "boolAllocateCmdResWRITE":"1",
            "boolAllocateCmdResWRITEA":"1",
            "boolAllocateCmdResPRE":"0",
            "boolCmdQueueFindAnyIssuable":"1",
            "boolPrintCmdTrace":"0",
            "strAddressMapStr":"_r_l_R_B_b_h_",
            "bankPolicy":"CLOSE",
            "nRC":"55",
            "nRRD":"4",
            "nRRD_L":"6",
            "nRRD_S":"4",
            "nRCD":"16",
            "nCCD":"4",
            "nCCD_L":"6",
            "nCCD_L_WR":"1",
            "nCCD_S":"4",
            "nAL":"15",
            "nCL":"16",
            "nCWL":"12",
            "nWR":"18",
            "nWTR":"3",
            "nWTR_L":"9",
            "nWTR_S":"3",
            "nRTW":"4",
            "nEWTR":"6",
            "nERTW":"6",
            "nEWTW":"6",
            "nERTR":"6",
            "nRAS":"39",
            "nRTP":"9",
            "nRP":"16",
            "nRFC":"420",
            "nREFI":"9360",
            "nFAW":"16",
            "nBL":"4"
        }
    else:
        l_configFile = open(g_config_file, 'r')
        for l_line in l_configFile:
            l_tokens = l_line.split(' ')
            l_params[l_tokens[0]] = l_tokens[1]

    return l_params
g_boolUseDefaultConfig = True
g_params = setup_config_params()
comp_cpu0 = sst.Component("cpu0", "prospero.prosperoCPU")
comp_cpu0.addParams({
    "verbose" : "1",
    "reader" : "prospero.ProsperoTextTraceReader",
    "readerParams.file" : "trace.txt",
    "clock" : "3GHz"
})

comp_l1cache0 = sst.Component("l1cache0", "memHierarchy.Cache")
comp_l1cache0.addParams({
      "access_latency_cycles" : "11",
      "cache_frequency" : "3GHz",
      "replacement_policy" : "lru",
      "coherence_protocol" : "MESI",
      "associativity" : "1",
      "cache_line_size" : "64",
      "L1" : "1",
      "cache_size" : "64B",
      "verbose" : 1
})

bus = sst.Component("bus", "memHierarchy.Bus")
bus.addParams({
      "bus_frequency" : "3GHz",
      "verbose" : 1
})

comp_l2cache = sst.Component("l2cache", "memHierarchy.Cache")
comp_l2cache.addParams({
      "access_latency_cycles" : "20",
      "cache_frequency" : "3 Ghz",
      "replacement_policy" : "lru",
      "coherence_protocol" : "MESI",
      "associativity" : "1",
      "cache_line_size" : "64",
      "cache_size" : "64B",
      "verbose"  : 1
})

comp_l3cache = sst.Component("l3cache", "memHierarchy.Cache")
comp_l3cache.addParams({
      "access_latency_cycles" : "100",
      "cache_frequency" : "3 Ghz",
      "replacement_policy" : "lru",
      "coherence_protocol" : "MESI",
      "associativity" : "1",
      "cache_line_size" : "64",
      "cache_size" : "64B",
      "verbose"    :  1
})

comp_memctrl = sst.Component("memory", "memHierarchy.MemController")
comp_memctrl.addParams({
    "verbose" : "1",
    "clock" : "3GHz",
    "addr_range_end" : 64*1024*1024*1024-1,
})
comp_memory = comp_memctrl.setSubComponent("backend", "memHierarchy.cramsim")
comp_memory.addParams({
    "access_time" : "2 ns",   # Phy latency
    "mem_size" : "64GiB",
    "verbose"   :  1
})

comp_memhBridge = sst.Component("memh_bridge", "CramSim.c_MemhBridge")
comp_memhBridge.addParams(g_params);
comp_memhBridge.addParams({
                     "verbose" : "1",
                     "numTxnPerCycle" : g_params["numChannels"],
                     })

comp_controller0 = sst.Component("MemController0", "CramSim.c_Controller")
comp_controller0.addParams(g_params)
comp_controller0.addParams({
                "verbose" : "1",
            "TxnConverter" : "CramSim.c_TxnConverter",
            "AddrHasher" : "CramSim.c_AddressHasher",
            "CmdScheduler" : "CramSim.c_CmdScheduler" ,
            "DeviceController" : "CramSim.c_DeviceController"
            })

comp_dimm0 = sst.Component("Dimm0", "CramSim.c_Dimm")
comp_dimm0.addParams(g_params)

sst.setStatisticLoadLevel(statLevel)
sst.enableAllStatisticsForAllComponents({"type":"sst.AccumulatorStatistic"})
sst.setStatisticOutput("sst.statOutputCSV")
sst.setStatisticOutputOptions( {
    "filepath"  : statFile,
    "separator" : ", "
    } )

link_cpu0_l1cache0 = sst.Link("link_cpu0_l1cache0")
link_cpu0_l1cache0.connect( (comp_cpu0, "cache_link", "1ps"), (comp_l1cache0, "high_network_0", "1ps") )

ink_l10_bus0 = sst.Link("link_l10_bus0")
link_l10_bus0.connect((comp_l1cache0, "low_network_0", "1ps"), (bus, "high_network_0", "1ps"))

link_bus_l2 = sst.Link("link_bus_l2")
link_bus_l2.connect( (bus, "low_network_0", "1ps"), (comp_l2cache, "high_network_0", "1ps") )

link_l2_l3 = sst.Link("link_l2_l3")
link_l2_l3.connect((comp_l2cache, "low_network_0", "1ps"), (comp_l3cache, "high_network_0", "1ps"))

link_l3_mem = sst.Link("link_l3_mem")
link_l3_mem.connect( (comp_l3cache, "low_network_0", "1ps"), (comp_memctrl, "direct_link", "1ps") )

link_dir_cramsim_link = sst.Link("link_dir_cramsim_link")
link_dir_cramsim_link.connect( (comp_memory, "cramsim_link", "2ps"), (comp_memhBridge, "cpuLink", "2ps") 

memHLink = sst.Link("memHLink_1")
memHLink.connect( (comp_memhBridge, "memLink", g_params["clockCycle"]), (comp_controller0, "txngenLink", g_params["clockCycle"]) )

cmdLink = sst.Link("cmdLink_1")
cmdLink.connect( (comp_controller0, "memLink", g_params["clockCycle"]), (comp_dimm0, "ctrlLink", g_params["clockCycle"]) )
dmukherj09 commented 1 year ago

The above SDL is file is mainly following the testBackendCramsim in the memHierarchy/tests/ directory and replacing the default CPU model with a prospero model

plavin commented 1 year ago

Could you share the sdl file again but in a code block?

Surround your code with ``` on either side so that the indentation is preserved. If you hit the preview button it should look like this:

import sst
import numpy
...

Or, if you have the sdl file in a repo please just share a link to the file.

dmukherj09 commented 1 year ago

Hi @plavin I have updated the code above

plavin commented 1 year ago

I've got your code running now. Are you able to share some of the trace? Not the whole thing, but just enough so I can observe the issues you're seeing here and in #2221.