s5z / zsim

A fast and scalable x86-64 multicore simulator
GNU General Public License v2.0
316 stars 183 forks source link

Issues with unmatchedFutexWakeups <= maxAllowedFutexWakeups assertion #91

Open vijay4454 opened 8 years ago

vijay4454 commented 8 years ago

Hello

I am trying to simulate a machine with 4 "OOO" cores. I have configured it similar to the "het.cfg" configuration without the wimpy("Simple") cores. I am running a pthread workload. This workloads creates 40 threads, waits for them to finish using the pthread_join and repeats this process for 2 more sets of 40 threads. So total 120 threads are created. For some reason, the maxAllowedFutexWakeup assertion fails in scheduler.cpp. The simulator then throws a deadlock detected warning. I am not sure what's going on. I ran the same workload for a machine configuration with 1 beefy core("OOO") and 10 wimpy("Simple") cores (very similar to the default het.cfg configuration with just the core count changed), and the workload simulated just fine without any warnings or errors.

I will look into the source code and try to figure out what's going wrong. I am just wondering if I am making some basic mistake and there is a quick way to fix this. Please let me know if you know what's going wrong. I have posted the entire log below.

thiruven@altair$ ./run_sim.sh scan mpu beefy4 > output_beefy4.txt [S 0] WARN: Did not find time in vDSO [S 0] WARN: Did not find __vdso_time in vDSO [S 0] WARN: Instrumenting vsyscall page code --- this process executes vsyscalls, which zsim does not virtualize! [S 1] WARN: Did not find time in vDSO [S 1] WARN: Did not find __vdso_time in vDSO [S 1] WARN: waitUntilQueued for pid 1 tid 6 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 7 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 2 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 3 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 8 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 9 timed out [H] WARN: Stalled for 20 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 10 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 11 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 5 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 6 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 12 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 13 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 14 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 15 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 4 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 7 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 2 timed out [H] WARN: Stalled for 20 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 16 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 3 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 5 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 8 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 9 timed out [S 1] WARN: Futex wake matching failed (1/1) (external/ff waiters?) [S 1] WARN: waitUntilQueued for pid 1 tid 17 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 18 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 19 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 11 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 10 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 20 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 13 timed out [H] WARN: Stalled for 20 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 21 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 22 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 23 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 24 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 25 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 26 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 16 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 3 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 7 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 4 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 12 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 2 timed out [H] WARN: Stalled for 20 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 28 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 5 timed out [H] WARN: Stalled for 30 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 8 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 29 timed out [H] WARN: Stalled for 40 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 30 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 17 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 31 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 15 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 9 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 32 timed out [S 1] WARN: Futex wake matching failed (1/1) (external/ff waiters?) [S 1] WARN: waitUntilQueued for pid 1 tid 33 timed out [H] WARN: Stalled for 20 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 34 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 18 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 11 timed out [H] WARN: Stalled for 30 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 20 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 19 timed out [H] WARN: Stalled for 40 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 36 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 37 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 38 timed out [S 1] WARN: Futex wake matching failed (1/1) (external/ff waiters?) [S 1] WARN: waitUntilQueued for pid 1 tid 10 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 39 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 21 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 13 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 40 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 23 timed out [H] WARN: Stalled for 20 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 22 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 41 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 14 timed out [H] WARN: Stalled for 30 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 25 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 27 timed out [H] WARN: Stalled for 40 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 24 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 26 timed out [H] WARN: Stalled for 50 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 3 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 16 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 7 timed out [H] WARN: Stalled for 60 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 12 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 4 timed out [H] WARN: Stalled for 70 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 15 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 2 timed out [H] WARN: Stalled for 80 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 5 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 31 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 8 timed out [H] WARN: Stalled for 90 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 30 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 29 timed out [H] WARN: Stalled for 100 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 32 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 17 timed out [H] WARN: Stalled for 110 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 36 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 28 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 9 timed out [H] WARN: Stalled for 120 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 38 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 20 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 34 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 33 timed out [S 1] WARN: Futex wake matching failed (0/1) (external/ff waiters?) [S 1] WARN: External futex wakes? (1/1) [S 1] WARN: waitUntilQueued for pid 1 tid 19 timed out [S 1] WARN: Futex wake matching failed (1/1) (external/ff waiters?) [S 1] WARN: waitUntilQueued for pid 1 tid 37 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 35 timed out [H] WARN: Stalled for 20 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 18 timed out [S 1] WARN: Futex wake matching failed (1/1) (external/ff waiters?) [S 1] WARN: waitUntilQueued for pid 1 tid 21 timed out [H] WARN: Stalled for 30 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 10 timed out [S 1] WARN: Futex wake matching failed (0/1) (external/ff waiters?) [S 1] WARN: External futex wakes? (0/0) [S 1] WARN: waitUntilQueued for pid 1 tid 13 timed out [H] WARN: Stalled for 40 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 23 timed out [S 1] WARN: Futex wake matching failed (0/1) (external/ff waiters?) [S 1] WARN: External futex wakes? (0/0) [S 1] WARN: waitUntilQueued for pid 1 tid 20 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 34 timed out [H] WARN: Stalled for 50 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 41 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 22 timed out [H] WARN: Stalled for 60 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 14 timed out [S 1] WARN: Futex wake matching failed (1/1) (external/ff waiters?) [S 1] WARN: waitUntilQueued for pid 1 tid 37 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 11 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 19 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 27 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 33 timed out [S 1] WARN: Futex wake matching failed (1/1) (external/ff waiters?) [S 1] WARN: waitUntilQueued for pid 1 tid 25 timed out [H] WARN: Stalled for 20 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 24 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 35 timed out [S 1] WARN: Futex wake matching failed (0/1) (external/ff waiters?) [H] WARN: Stalled for 30 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 26 timed out [S 1] WARN: Futex wake matching failed (1/1) (external/ff waiters?) [S 1] WARN: External futex wakes? (0/0) [S 1] WARN: waitUntilQueued for pid 1 tid 3 timed out [H] WARN: Stalled for 40 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 13 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 16 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 7 timed out [H] WARN: Stalled for 50 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 10 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 21 timed out [H] WARN: Stalled for 60 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 4 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 12 timed out [H] WARN: Stalled for 70 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 15 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 23 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 20 timed out [H] WARN: Stalled for 80 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 2 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 5 timed out [H] WARN: Stalled for 90 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 31 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 22 timed out [H] WARN: Stalled for 100 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 8 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 30 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 29 timed out [H] WARN: Stalled for 110 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 37 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 34 timed out [S 1] WARN: Futex wake matching failed (1/1) (external/ff waiters?) [H] WARN: Stalled for 120 secs so far [S 1] WARN: waitUntilQueued for pid 1 tid 11 timed out [S 1] WARN: waitUntilQueued for pid 1 tid 32 timed out [H] WARN: Stalled for 130 secs so far [H] WARN: Deadlock detected, killing children [H] WARN: Hard death at exit (1 children running), killing the whole process tree

Thanks Vijay

AlaskaJoslin commented 8 years ago

Hi Vijay, I've been working on this issue for a few weeks now for a similar workload type. I initially thought that it was a configuration issue as well, but after investigating the source code I've realized that zsim simply won't support pthreads at the moment. Pthreads use futexes for synchronization, and zsim's implementation of futexes isn't sufficient yet for a number of reasons. If you comment out the offending source code, zsim simply deadlocks at the synchronization barrier. You could try increasing the phase length to be large enough that the barrier is never reached, but the results will be unrealistic. I'm working on a patch which I hope to get accepted to the master branch, but until the source code is fixed there is no quick correct solution. Matt

vijay4454 commented 8 years ago

Hi Matt

Thank you for your response. That is very helpful. I will look forward to the patch.

Could you please give an idea about how long it might take for the patch to be accepted into the Master branch? That would help me with planning for my project.

AlaskaJoslin commented 8 years ago

I'm afraid it is more of an issue of finishing the patch than getting it accepted. If you would like to help I can send you the relevant source code. I've been stuck for awhile on eliminating race conditions at the barrier. I hope to finish within a week, but can't make any promises.

s5z commented 8 years ago

@AlaskaJoslin, please submit a pull request when you have a patch. This code used to be robust in 12.04 but the futex ABI might have changed in later kernel/glibc versions. We see these occasionally as well.

You can always disable futex wait/wake matching, but note that this may affect simulation fidelity (this code simply waits until all woken threads make it back from the OS before resuming simulation, so joins may be slightly delayed)