s5z / zsim

A fast and scalable x86-64 multicore simulator
GNU General Public License v2.0
317 stars 183 forks source link

Deadlock in openmp application #268

Open captainnotseeingthesea opened 3 months ago

captainnotseeingthesea commented 3 months ago

Hi, I'm trying to run openmp application on zsim. I tried a very simple program, accumulating an array and it works well on the host. But, when simulating, it may deadlock ( occasionally complete). I test it on different ubuntu system with the same config file and face the same problem. [Config file] 4 simple cores [Test system 1] Ubuntu 14.04 (4.4.0 kernel) / Pin-2.14-71293 gcc: 4.8.5 [Test system 2] Ubuntu 20.04 (6.2.0 kernel) / Pin-2.14-71293 gcc: 9.4.0 [Test system 3] Ubuntu 22.04 (6.5.0 kernel) / Pin-2.14-71293) gcc: 11.4.0

[test.cpp]

int main() {
    const int size = 100;
    int arr[size];
    int sum = 0;

    // Initialize the array
    for (int i = 0; i < size; i++) {
        arr[i] = i + 1;
    }

    #pragma omp parallel for reduction(+:sum)
    for (int i = 0; i < size; i++) {
        sum += arr[i];
    }

    std::cout << "Sum: " << sum << std::endl;
    return 0;
}

[4simple.cfg]

sim = {
    stats = "test4cores"
    domains = 1;
    phaseLength = 1000;
    statsPhaseInterval = 10000;
    printHierarchy = true;
    pinOptions = "-ifeellucky"
};

sys = {
    caches = {
        l1d = {
            array = {
                type = "SetAssoc";
                ways = 8;
            };
            caches = 4;
            latency = 4;
            size = 32768;
        };
        l1i = {
            array = {
                type = "SetAssoc";
                ways = 4;
            };
            caches = 4;
            latency = 3;
            size = 32768;
        };
        l2 = {
            array = {
                type = "SetAssoc";
                ways = 8;
            };
            caches = 4;
            latency = 7;
            children = "l1i|l1d";
            size = 262144;
        };
        l3 = {
            array = {
                hash = "H3";
                type = "SetAssoc";
                ways = 16;
            };
            banks = 4;
            caches = 1;
            latency = 27;
            children = "l2";
            size = 8388608;
        };
    };

    cores = {
        simpleCore = {
            cores = 4;
            dcache = "l1d";
            icache = "l1i";
            type = "Simple";
        };
    };

    frequency = 2800;
    lineSize = 64;
};

process0 = {
    command = "~/DAMOV/workloads/OPENMP/test";
};

[Message]

[S 0] Started process, PID 3909
[S 0] procMask: 0x0
[H] Attached to global heap
[S 0] vDSO info initialized
[S 0] Started contention simulation thread 0
[S 0] Started scheduler watchdog thread
[S 0] Thread 0 starting
[S 0] FF control Thread TID 3920
[S 0] [0] Post-patching SYS_sched_getaffinity size 8 cpuset 0x5555555592a0
[S 0] Thread 4 starting
[S 0] Thread 5 starting
[S 0] Thread 6 starting
Sum: 5050
[S 0] Thread 0 finished
[H] WARN: Stalled for 20 secs so far
[H] WARN: Stalled for 30 secs so far
[H] WARN: Stalled for 40 secs so far
[H] WARN: Stalled for 50 secs so far
[H] WARN: Stalled for 60 secs so far
[H] WARN: Stalled for 70 secs so far
[H] WARN: Stalled for 80 secs so far
[H] WARN: Stalled for 90 secs so far
[H] WARN: Stalled for 100 secs so far
[H] WARN: Stalled for 110 secs so far
[H] WARN: Stalled for 120 secs so far
[H] WARN: Stalled for 130 secs so far
[H] WARN: Deadlock detected, killing children
[H] Received interrupt
[H] Attempting graceful termination
[H] Killing process 5141
[H] Done sending kill signals
[H] WARN: Hard death at exit (1 children running), killing the whole process tree
Killed

Does anyone saw this kind of problem? Any comments or suggestion will be helpful.

Thanks, Xuanyi