s5z / zsim

A fast and scalable x86-64 multicore simulator
GNU General Public License v2.0
326 stars 182 forks source link

Zsim doesn't simulate contention in cache #211

Open nirdavid opened 6 years ago

nirdavid commented 6 years ago

Hi,

I'm running basic tests with heavy contention on multicore system, However, zsim doesn't simulate contention in cache (L2). There is no increase in the number of cycles even though there is increase in the number of threads. Why the simulator doesn't support this? I understand that zsim simulates contention in core, but what about the caches?

hlitz commented 6 years ago

Maybe because your config has a private l2?

On Jul 10, 2018 6:26 PM, "nirdavid" notifications@github.com wrote:

Hi,

I'm running basic tests with heavy contention on multicore system, However, zsim doesn't simulate contention in cache (L2). Why?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/s5z/zsim/issues/211, or mute the thread https://github.com/notifications/unsubscribe-auth/ADZKfvhT0jzplPtZ5wr1zx0aOfT4wR4rks5uFNWagaJpZM4VJupy .

gaomy3832 commented 6 years ago

If you try to model contention in a shared cache, try use TimingCache (see #36 and the explanation in #59).

nirdavid commented 6 years ago

Thanks for the comments! I'm using timing cache. My L2 is supposed to be shared, this is my config file:

sys = {
    cores = {
        simpleCore = {
            type = "Timing";
            dcache = "l1d";
            icache = "l1i";
            cores = 64;
        };
    };

    lineSize = 64;

    caches = {
        l1d = {
            caches = 64;
            size = 65536;
        };
        l1i = {
            caches = 64;
            size = 32768;
        };
        l2 = {
            caches = 1;
            size = 2097152;
            children = "l1i|l1d";  // interleave
        type = "Timing";
        };
    };

    mem = {
        type = "DDR";
        controllers = 4;
        tech = "DDR3-1066-CL8";
    };
};

sim = {
    phaseLength = 1000;
    printHierarchy = true;
    // attachDebugger = True;
    schedQuantum = 50;  // switch threads frequently
    procStatsFilter = "l1.*|l2.*";
};

process0 = {
    command = "/home/nir/e/stats/test2 4";
};

And this is my running test (the function which each thread executes).

void do_work()
{
    for (int i = 0; i < 1000000/THREADS_NUM; ++i) {
        data.fetch_add(1, std::memory_order_relaxed);
    }
}

I'm running it four times with 4, 8, 16 and 32 threads.