Closed paulfloyd closed 2 years ago
On FreeBSD 13.0 compiling the testcase with clang 13 doesn't reproduce the issue.
And 13.1 with clang 11 does reproduce.
So this is a Valgrind / OS / libc / libthr problem
I'll try cutting down the testcase, first trimming the types of testcase - just T(race) to start with.
I tried running the TC with truss alone / under Valgrind. The truss/Valgrind log I'm leaving for the moment as it is too long and the diffs are illegible. I'll try again another time, filtering out sigprocmask, thr_self, read and write.
The log running alone under truss shows quite a few differences.
13.0 for thread creation does pid: thr_new(0x7fffffffe180,0x68) = 0 (0x0) pid: <new thread 101826> pid: sigfastblock(0x1,0x800a12538) = 0 (0x0) pid: _umtx_op(0x80026fc80,UMTX_OP_RW_RDLOCK,0x0,0x0,0x0) = 0 (0x0) pid: mmap(0x7fffdfbfc000,2101248,PROT_READ|PROT_WRITE,MAP_STACK,-1,0x0) = 140736947273728 (0x7fffdfbfc000) pid: mprotect(0x7fffdfbfc000,4096,PROT_NONE) = 0 (0x0)
13.1 does pid: thr_new(0x7fffffffe8f0,0x68) = 0 (0x0) pid: mmap(0x7fffdf9fb000,2101248,PROT_READ|PROT_WRITE,MAP_STACK,-1,0x0) = 140736945172480 (0x7fffdf9fb000) pid: mprotect(0x7fffdf9fb000,4096,PROT_NONE) = 0 (0x0)
No sigfastblock, no umtx rw rdlock
I don't think that that makes a difference.
On 13.0 I see
pid: _umtx_op(0x8006b0008,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0) pid: _umtx_op(0x8006b0008,UMTX_OP_MUTEX_WAKE2,0x0,0x0,0x0) = 0 (0x0)
(sometimes but not always together like that). There are 9 WAITs and 9 WAKE2s (not sure idf they are 100% reproducible).
I don't see any pattern or what is using the mutex (just 1 as far as I can tell).
Also I have no clue as to why 13.1 does not use this mutex.
Next I tried disabling buffering of output (see comment in #189 )
SYSCALL[3098,15](240) sys_nanosleep ( 0x200890, 0x0 )--3098:4: gdbsrv VG_(gdbserver_report_syscall) before 1 sysno 240 tid 15
--> [async] ...
ALL
--3098-- SCHED[13]: releasing lock (VG_(client_syscall)[async]) -> VgTs_WaitSys
--3098-- SCHED[12]: acquired lock (VG_(client_syscall)[async])
--3098-- SCHED[12]: TRC: SYSCALL
--3098-- SCHED[12]: releasing lock (VG_(client_syscall)[async]) -> VgTs_WaitSys
--3098-- SCHED[15]: acquired lock (VG_(client_syscall)[async])
--3098-- SCHED[15]: TRC: SYSCALL
--3098-- SCHED[15]: releasing lock (VG_(client_syscall)[async]) -> VgTs_WaitSys
--3098-- SCHED[2]: acquired lock (VG_(client_syscall)[asySYSCALL[3098,2](240) ... [async] --> Success(0x0)--3098:4: gdbsrv VG_(gdbserver_report_syscall) before 0 sysno 240 tid 2
SYSCALL[3098,2]( 4) sys_write ( 1, 0x4ea9000, 48 )--3098:4: gdbsrv VG_(gdbserver_report_syscall) before 1 sysno 4 tid 2
--> [async] ...
tls_ptr: case "race" has mismatch: *ip=4 here=2
SYSCALL[3098,2]( 4) ... [async] --> Success(0x30)--3098:4: gdbsrv VG_(gdbserver_report_syscall) before 0 sysno 4 tid 2
SYSCALL[3098,2](240) sys_nanosleep ( 0x200890, 0x0 )--3098:4: gdbsrv VG_(gdbserver_report_syscall) before 1 sysno 240 tid 2
--> [async] ...
Killed
It's hanging in a call to nanosleep. That's also what I saw in gdb.
tls_ptr: case "race" has mismatch: *ip=16 here=8
^C
Thread 1 received signal SIGINT, Interrupt.
Sent by kernel.
vgModuleLocal_do_syscall_for_client_WRK () at m_syswrap/syscall-amd64-freebsd.S:135
135 setc 0(%rsp) /* stash returned carry flag */
(gdb)
That made me worry a bit about the change I made for resuming syscalls when the call to save the carry flag is being made. Is that really thread-safe, even for Valgrind that is only executing one thread at a time.
I can then single step to
#1 0x00000000380bff4c in vgPlain_do_syscall (sysno=28503, sysno@entry=3, a1=17220295950, a2=1, a2@entry=17220295950, a3=0, a4=a4@entry=0,
a5=0, a6=0, a7=0, a8=0) at m_syscall.c:1153
#2 0x00000000380becee in vgPlain_read (fd=<optimized out>, buf=0x0, buf@entry=0x40268dd0e, count=count@entry=1) at m_libcfile.c:315
#3 0x0000000038141454 in vgModuleLocal_sema_down (sema=0x4023705d0, as_LL=0 '\000') at m_scheduler/sema.c:107
#4 0x000000003813d50d in vgPlain_acquire_BigLock_LL (who=<optimized out>) at m_scheduler/scheduler.c:425
#5 vgPlain_acquire_BigLock (tid=tid@entry=1, who=0x380064ef "VG_(client_syscall)[async]") at m_scheduler/scheduler.c:349
#6 0x00000000381448a3 in vgPlain_client_syscall (tid=tid@entry=1, trc=<optimized out>, trc@entry=73) at m_syswrap/syswrap-main.c:2353
#7 0x0000000038140340 in handle_syscall (tid=tid@entry=1, trc=trc@entry=73) at m_scheduler/scheduler.c:1211
#8 0x000000003813e463 in vgPlain_scheduler (tid=tid@entry=1) at m_scheduler/scheduler.c:1529
#9 0x0000000038150874 in thread_wrapper (tidW=<optimized out>) at m_syswrap/syswrap-freebsd.c:109
#10 run_a_thread_NORETURN (tidW=1) at m_syswrap/syswrap-freebsd.c:163
#11 0x0000000000000000 in ?? ()
So this is trying to get the semaphore the_BigLock, hanging in the read
.
How is it possible to be using 100% CPU in a read like that?
I'm beginning to think that this is some sort of interaction between the guest nanosleep and Valgrind's scheduler lock.
Is it possible that when the scheduler is getting the lock the nanosleep interrups it and then we get stuck in some sort of syscall restart pingpong? Or somehow the semaphore gets into a
If I reduce the nanosleep then the TC seems to run reliably. I'll try twiddling with the nanosleep of 13.0 to see if I can cause the problem.
This also hangs on 14.0 but not 12.2 or 12.3
I ran the TC under ktrace, like this
ktrace -di -t cinpstw ../../vg-in-place --tool=none ./tls
For the hang that I observed it go a bit over half way. 14 threads get created, and they each loop 10 times with a nanosleep each, for a total of 140 nanosleeps. The other source of syscalls from the guest are the 19 printf from the first two threads that inncrementing a plain global variable.
A bit before the end there is
9340 none-amd64-freebsd GIO fd 28504 wrote 1 byte
"X"
9340 none-amd64-freebsd RET write 1
9340 none-amd64-freebsd CALL sigprocmask(SIG_SETMASK,0x40396fdf0,0x40396fdb0)
9340 none-amd64-freebsd RET sigprocmask 0
9340 none-amd64-freebsd CALL nanosleep(0x200890,0)
Then there is a chunk of returning nanosleeps and attemps to read the BigLock pipe
9340 none-amd64-freebsd CSW resume kernel "nanslp"
9340 none-amd64-freebsd RET nanosleep 0
9340 none-amd64-freebsd CALL sigprocmask(SIG_SETMASK,0x40476bdb0,0)
9340 none-amd64-freebsd RET sigprocmask 0
9340 none-amd64-freebsd CALL thr_self(0x40476bca8)
9340 none-amd64-freebsd RET thr_self 0
9340 none-amd64-freebsd CALL read(0x6f57,0x40476bcfe,0x1)
9340 none-amd64-freebsd CSW stop kernel "pipelk"
9340 none-amd64-freebsd CSW resume kernel "nanslp"
9340 none-amd64-freebsd RET nanosleep 0
9340 none-amd64-freebsd CALL sigprocmask(SIG_SETMASK,0x40486fdb0,0)
9340 none-amd64-freebsd RET sigprocmask 0
9340 none-amd64-freebsd CALL thr_self(0x40486fca8)
9340 none-amd64-freebsd RET thr_self 0
9340 none-amd64-freebsd CALL read(0x6f57,0x40486fcfe,0x1)
9340 none-amd64-freebsd CSW stop kernel "pipelk"
9340 none-amd64-freebsd CSW resume kernel "nanslp"
9340 none-amd64-freebsd RET nanosleep 0
When everything goes pear-shaped there is
9340 none-amd64-freebsd GIO fd 28503 read 1 byte
"X"
9340 none-amd64-freebsd RET read 1
9340 none-amd64-freebsd CSW stop user "ast"
9340 none-amd64-freebsd CSW resume kernel "pipelk"
9340 none-amd64-freebsd CSW stop kernel "piperd"
9340 none-amd64-freebsd CSW resume kernel "pipelk"
9340 none-amd64-freebsd CSW stop kernel "piperd"
So someone has read from the pipe and has the lockj.
After that there is an "ast" asynchronous system trap, an involuntary context switch.. After that the Valgrind scheduler seems to have more or less stopped - I see one attempt to read the pipe. Other than that just a lot of kernel context switches between pipe lock and pipe read.
On FreeBSD 13.1 I also see some "ast"s but they always get paired with a resume.
From this perspective it looks like Valgrind + this testcase is triggering a kernel issue where the "ast" is getting stuck.
Running FreeBSD 13.1 in a 2CPU VM seems to fix the problem. In my opinion, a kernel bug.
See this thread
https://docs.freebsd.org/cgi/getmsg.cgi?fetch=4438867+0+current/freebsd-hackers
I just logged a bug
FreeBSD 13.1, amd64
Seems intermittent
The testcase has main creating 2x 7 threads, all with the same function tls_ptr, but with data containing a pointer to function (of which there are 7).
The output is something like
(requires manual killing, not responding to ctrl-c)
I have no idea if the function/data being tested changes anything (e.g., is it always the global variable test that hangs, or is it randomly any of the seven and only due to the use of signals and threads).
With plenty of tracing
ktrace / kdump