paulfloyd / freebsd_valgrind

Git repo used to Upstream the FreeBSD Port of Valgrind
GNU General Public License v2.0
15 stars 4 forks source link

gdbserver_tests/mcinvokeWS fails on i386 [(racy?) #194

Open paulfloyd opened 1 year ago

paulfloyd commented 1 year ago

On x86

== 23 tests, 0 stderr failures, 0 stdout failures, 1 stderrB failure, 2 stdoutB failures, 0 post failures == gdbserver_tests/mcinvokeWS (stdoutB) gdbserver_tests/mcinvokeWS (stderrB) gdbserver_tests/nlcontrolc (stdoutB)

nlcontrolc is expected

mcinvokeWS is missing

-monitor command request to kill this process from stdout and

-readchar: Got EOF -error reading packet

from stderr

If I run vgdb with -d -d -d -d then I see

sending command v.kill to pid .... 23:54:08.902231 writing writing hex command to pid len 22 $qRcmd,762e6b696c6c#60 notify: 1 23:54:08.902252 reading static buf readchar 23:54:08.902331 read static buf readchar $O3d3d32353531353d3d206d6f6e69746f7220636f6d6d616e64207265717565737420746f206b696c6c20746869732070726f636573730a#2e 23:54:08.902354 getpkt ("O3d3d32353531353d3d206d6f6e69746f7220636f6d6d616e64207265717565737420746f206b696c6c20746869732070726f636573730a"); [no ack] 23:54:08.902381 reading static buf readchar 23:54:08.902571 read static buf readchar readchar: Got EOF 23:54:08.902601 [getpkt: discarding char ''] error reading packet 23:54:08.902627 nr received signals: sigint 0 sigterm 0 sighup 0 sigpipe 0 23:54:08.902659 joining with invoke_gdbserver_in_valgrind_thread 23:54:08.904560 after waitpid pid 25515 p 25515 status 0x........ WIFEXITED 0

Two of the missing lines are there

And there's no stdout diff.

So it seems as though this TC is a bit racy on my i386 VM.

The guest is run with

1 10000000 0 -S-S-S-S

That's just 1 loop with a sleepms of 10 million (about 3 hours). No burner.

If ptrace is interrupting syscalls, at least main will leave its select() and then be in pthread_join(). If these also get interrupted main could end before vgdb has a chance to kill it.

Need ktrace.

ktrace not that helpful.

I see the ptraces, indirectly. One select() gets interrupted (main I guess) and then I see 9x pthread_joins (well, _umtx_ops) being interrupted and restarted. I guess there's enough libthr wrapping to detect and restart.

That doesn't look racy though.

So what exactly does "v.kill" do in vgdb?

As a reminder the command is

./vgdb "$@" -c v.kill

more precisely

./vgdb --vgdb-prefix=./vgdb-prefix-mcinvokeWS --wait=60 -c v.wait 0 -c v.kill

--wait controls the effort to find the FIFOs. It's the number of times vgdb will scan the directory for the FIFOs, with a 1s delay between each attempt.

--vgdb-prefix prefix of FIFOs to look for

-c commands

pretty cheesy command line parsing here

when a -c is seen an entry is added in 'commands'. This is a VLA allocated in main of char*s of size argc. The entry is just a 1 char array initialized to '\0'

then when any other string is encountered the last 'commands' entry is realloc'd and the string added this will be repeated with space separation

so we will have 2 entries in 'commands': "v.wait 0" "v.kill"

After the command line parsing, vgdb will either setup gdb relay (presumably to send monitor commands) or just directly send the comands. I think that vgdb then creates a thread:

if (max_invoke_ms > 0 || cmd_time_out != NEVER) pthread_create(&invoke_gdbserver_in_valgrind_thread, NULL, invoke_gdbserver_in_valgrind, (void *) &pid);

Then vgdb sends wakeup messages down the pipe, with half second delays. Is the write blocking? Don't think so.

Meanwhile the thread is watching the amount of data written by vgdb but not read by valgrind, it has a lot of timer stuff as well,

Note

 if (max_invoke_ms > 0 && timercmp (&now, &invoke_time, >=)) {

and

/ Convenience macros for operations on timevals. NOTE: `timercmp' does not work for >= or <=. /

define timerisset(tvp) ((tvp)->tv_sec || (tvp)->tv_usec)

define timerclear(tvp) ((tvp)->tv_sec = (tvp)->tv_usec = 0)

define timercmp(a, b, CMP) \

(((a)->tv_sec == (b)->tv_sec) ? \ ((a)->tv_usec CMP (b)->tv_usec) : \ ((a)->tv_sec CMP (b)->tv_sec))

maybe that should be ! timercmp <

basically I think that this is saying "data is waiting and I'm timing out" so it calls

invoker_invoke_gdbserver

Back to standalone_send_commands

It then sets up no-ack mode and starts sending the commands

Conveniently the command is converted to hex.

Then it reads a packet and checks there is no error.

Back over in Valgrind

handle_gdb_valgrind_command

this converts the string into a number

It then does a poll with timeout, in this case 0, so really just checking for signals

Now, kill

This calls

kill_request ("monitor command request to kill this process\n");

==

VG(umsg) ("%s", msg); VG(exit) (0);

That's one of the missing strings. It's a kind of prints. I did think that exit might happen before the printf buffer gets flushed.

static void exit_wrk( Int status, Bool gdbserver_call_allowed)

Need to look at that gdbserver stuff

Looks like this calls

VG_(gdbserver_exit)

there's a lot of stuff happening there

finally, syscall exit

I think that valgrind must be exiting before vgdb sends v.kill

Hust adding some echos to the invoker script:

--- mcinvokeWS.stdoutB.exp 2021-12-11 13:51:21.127534000 +0100 +++ mcinvokeWS.stdoutB.out 2022-06-17 08:24:12.446625000 +0200 @@ -18,6 +18,7 @@ gdbserver: continuing after wait ... gdbserver: continuing in 0 ms ... gdbserver: continuing after wait ... +killing vg with vgdb gdbserver: continuing in 0 ms ... gdbserver: continuing after wait ... -monitor command request to kill this process +killed vg with vgdb

As noted above the two commands are wait 0 kill

we're seeing the wait 0 output but not the kill

I presume that proves the valgrind hasn't terminated before the wait 0 command.

What is happening next? Is valgrind terminating before the kill, or is the kill being received and valgrind terminating before output gets printed?