paulfloyd / freebsd_valgrind

Git repo used to Upstream the FreeBSD Port of Valgrind
GNU General Public License v2.0
15 stars 4 forks source link

none/tests/pth_self_kill_15_other is failing [x86, clang and gcc] #83

Closed paulfloyd closed 2 years ago

paulfloyd commented 4 years ago

On amd64 with --trace-syscalls=yes I see

SYSCALL[14971,1](433) sys_thr_kill ( 101269, 15 )--14971-- thr_kill: sending signal 15 to tid 101269
 --> [async] ... 
SYSCALL[14971,1](433) ... [async] --> Success(0x0) --14971-- thr_kill: sent signal 15 to tid 101269

But on i386 this is

YSCALL[93600,1](433) sys_thr_kill ( 100753, 15 )--93600-- thr_kill: sending signal 15 to tid 100753
 --> [async] ... 
--93600-- async signal handler: signal=15, tid=2, si_code=65543, exitreason VgSrc_None
--93600-- interrupted_syscall: tid=2, ip=0x380e299d, restart=False, sres.isErr=True, sres.val=4
--93600--   completed, but uncommitted: committing
--93600-- delivering signal 15 (SIGTERM):65543 to thread 2
--93600-- push_signal_frame (thread 2): signal 15
==93600==    at 0x6B10C43: _nanosleep (in /lib/libc.so.7)
==93600==    by 0x6A7EA94: sleep (in /lib/libc.so.7)
==93600==    by 0x8048857: t (pth_self_kill.c:17)
==93600==    by 0x69F188A: ??? (in /lib/libthr.so.3

On i386 in gdb if I put a breakpoint on async_signalhandler then the callstack is

#0  async_signalhandler (sigNo=15, info=0x71bdcc0, uc=0x71bda00) at m_signals.c:2505
#1  <signal handler called>
#2  vgModuleLocal_do_syscall_for_client_WRK () at m_syswrap/syscall-x86-freebsd.S:134
#3  0x3808d147 in do_syscall_for_client (syscall_mask=0x71bde1c, tst=<optimized out>, syscallno=<optimized out>) at m_syswrap/syswrap-main.c:368
#4  vgPlain_client_syscall (tid=<optimized out>, trc=<optimized out>) at m_syswrap/syswrap-main.c:2277
#5  0x38089cc1 in handle_syscall (tid=tid@entry=2, trc=77) at m_scheduler/scheduler.c:1211
#6  0x3808b372 in vgPlain_scheduler (tid=<optimized out>) at m_scheduler/scheduler.c:1529
#7  0x38097f72 in thread_wrapper (tidW=2) at m_syswrap/syswrap-freebsd.c:105
#8  run_a_thread_NORETURN (tidW=2) at m_syswrap/syswrap-freebsd.c:159

This is not easy to debug. I don't see problems when running under gdb (or lldb). Also, 32on64 works OK.

My impressions so far are

paulfloyd commented 4 years ago

I can't reproduce this for a 32bit binary running on amd64 kernel.

Further, this isn't related to issue #122

paulfloyd commented 4 years ago

Debugging this a bit more, and I saw the following

amd64 does the same for the first four points, but for the last there is no jump.

paulfloyd commented 4 years ago

ktrace seems to provide interesting information. Running standalone I get (from thread kill onwards)

 92188 pth_self_kill CALL  thr_kill(0x189a7,SIGTERM)
 92188 pth_self_kill RET   thr_kill 0
 92188 pth_self_kill CALL  sigprocmask(SIG_SETMASK,0x2809058c,0xffbfeae8)
 92188 pth_self_kill RET   sigprocmask 0
 92188 pth_self_kill CALL  sigaction(SIGTERM,0xffbfeab8,0xffbfeaa0)
 92188 pth_self_kill RET   sigaction 0
 92188 pth_self_kill CALL  sigprocmask(SIG_SETMASK,0xffbfeae8,0)
 92188 pth_self_kill RET   sigprocmask 0
 92188 pth_self_kill CALL  mmap(0,0x20000,0x3<PROT_READ|PROT_WRITE>,0x1002<MAP_PRIVATE|MAP_ANON>,0xffffffff,0,0)
 92188 pth_self_kill RET   mmap 673558528/0x2825b000
 92188 pth_self_kill CALL  exit(0)
 92188 pth_self_kill RET   nanosleep -1 errno 4 Interrupted system call
 92188 pth_self_kill PSIG  SIGTERM SIG_DFL code=SI_LWP

For 32on64 I get (deleting a lot of stuff like sigprocmask and thr_self

  5899 none-x86-freebsd CALL  thr_kill(0x18dbe,SIGTERM)
  5899 none-x86-freebsd RET   thr_kill 0
  5899 none-x86-freebsd PSIG  SIGTERM caught handler=0x380d9d10 mask=0x0 code=SI_LWP
  5899 none-x86-freebsd CALL  mmap(0x60ef000,0x20000,0x3<PROT_READ|PROT_WRITE>,0x1012<MAP_PRIVATE|MAP_FIXED|MAP_ANON>,0xffffffff,0,0)
  5899 none-x86-freebsd RET   mmap 101642240/0x60ef000
  5899 none-x86-freebsd CALL  thr_kill(0x18e18,SIG 128)
  5899 none-x86-freebsd RET   thr_kill 0
  5899 none-x86-freebsd RET   nanosleep -1 errno 4 Interrupted system call
  5899 none-x86-freebsd CALL  thr_self(0x4eacd9c)
  5899 none-x86-freebsd PSIG  SIG -128 caught handler=0x380d9f00 mask=0x0 code=SI_LWP
  5899 none-x86-freebsd CALL  thr_exit(0x2)
  5899 none-x86-freebsd CALL  exit(0x2)

And pure x86

 92213 none-x86-freebsd CALL  thr_kill(0x18710,SIGTERM)
 92213 none-x86-freebsd RET   thr_kill 0
 92213 none-x86-freebsd RET   nanosleep -1 errno 4 Interrupted system call
 92213 none-x86-freebsd PSIG  SIGTERM caught handler=0x38049a50 mask=0x0 code=SI_LWP
 92213 none-x86-freebsd PSIG  SIGSEGV caught handler=0x3804a440 mask=0xfffef067 code=SEGV_MAPERR
 92213 none-x86-freebsd CALL  kill(0x16835,SIGSEGV)
 92213 none-x86-freebsd RET   kill 0
 92213 none-x86-freebsd PSIG  SIGSEGV SIG_DFL code=SI_USER

If I can summarize that

Standalone

  1. thr_kill
  2. exit
  3. nanosleep interrupted
  4. sigterm

32on64

  1. thr_kill
  2. catch sigterm
  3. send VGKILL
  4. nanosleep interrupted
  5. VGKILL
  6. thr_exit

Pure x86

  1. thr_kill
  2. nanosleep interrupted
  3. catch sigterm
  4. catch sigsegv
  5. kill
  6. sigsevg
  7. core
paulfloyd commented 4 years ago

Some similarities with issue #136

I do see these from ktrace

8166 none-x86-freebsd RET sigtimedwait -1 errno 35 Resource temporarily unavailable

https://stackoverflow.com/questions/17012206/catching-sigchld-using-sigtimedwait-on-bsd

Quick and dirty attempt, but it doesn't seem to fix anything

Int VG_(sigtimedwait_zero)( const vki_sigset_t *set, 
                            vki_siginfo_t *info )
{
   /*
   static const struct vki_timespec zero = { 0, 0 };

   SysRes res = VG_(do_syscall3)(__NR_sigtimedwait, (UWord)set, (UWord)info,
                                   (UWord)&zero);
   return sr_isError(res) ? -1 : sr_Res(res);
   */

   SysRes res = VG_(do_syscall0)(__NR_kqueue);
   int kq = sr_Res(res);
   struct kevent ke;
   struct timespec zero = { 0, 0 };

   EV_SET(&ke, set->sig[0], EVFILT_SIGNAL, EV_ADD, 0, 0, NULL);
   VG_(do_syscall6)(__NR_kevent, kq, (UWord)&ke, 1, (UWord)NULL, 0, (UWord)NULL);
   res = VG_(do_syscall6)(__NR_kevent, kq, (UWord)NULL, 0, (UWord)&ke, 1, (UWord)&zero);
   VG_(do_syscall1)(__NR_close, kq);
   return sr_isError(res) ? -1 : sr_Res(res);
}
paulfloyd commented 2 years ago

Also looks good with https://bugs.kde.org/show_bug.cgi?id=445032