-monitor command request to kill this process
from stdout and
-readchar: Got EOF
-error reading packet
from stderr
If I run vgdb with -d -d -d -d then I see
sending command v.kill to pid ....
23:54:08.902231 writing writing hex command to pid len 22 $qRcmd,762e6b696c6c#60 notify: 1
23:54:08.902252 reading static buf readchar
23:54:08.902331 read static buf readchar $O3d3d32353531353d3d206d6f6e69746f7220636f6d6d616e64207265717565737420746f206b696c6c20746869732070726f636573730a#2e
23:54:08.902354 getpkt ("O3d3d32353531353d3d206d6f6e69746f7220636f6d6d616e64207265717565737420746f206b696c6c20746869732070726f636573730a"); [no ack]
23:54:08.902381 reading static buf readchar
23:54:08.902571 read static buf readchar
readchar: Got EOF
23:54:08.902601 [getpkt: discarding char '']
error reading packet
23:54:08.902627 nr received signals: sigint 0 sigterm 0 sighup 0 sigpipe 0
23:54:08.902659 joining with invoke_gdbserver_in_valgrind_thread
23:54:08.904560 after waitpid pid 25515 p 25515 status 0x........ WIFEXITED 0
Two of the missing lines are there
And there's no stdout diff.
So it seems as though this TC is a bit racy on my i386 VM.
The guest is run with
1 10000000 0 -S-S-S-S
That's just 1 loop with a sleepms of 10 million (about 3 hours). No burner.
If ptrace is interrupting syscalls, at least main will leave its select() and then be in pthread_join(). If these also get interrupted main could end before vgdb has a chance to kill it.
Need ktrace.
ktrace not that helpful.
I see the ptraces, indirectly. One select() gets interrupted (main I guess) and then I see 9x pthread_joins (well, _umtx_ops) being interrupted and restarted. I guess there's enough libthr wrapping to detect and restart.
--wait controls the effort to find the FIFOs. It's the number of times vgdb will scan the directory for the FIFOs, with a 1s delay between each attempt.
--vgdb-prefix prefix of FIFOs to look for
-c commands
pretty cheesy command line parsing here
when a -c is seen an entry is added in 'commands'. This is a VLA allocated in main of char*s of size argc. The entry is just a 1 char array initialized to '\0'
then when any other string is encountered the last 'commands' entry is realloc'd and the string added
this will be repeated with space separation
so we will have 2 entries in 'commands':
"v.wait 0"
"v.kill"
After the command line parsing, vgdb will either setup gdb relay (presumably to send monitor commands) or just directly send the comands. I think that vgdb then creates a thread:
basically I think that this is saying "data is waiting and I'm timing out" so it calls
invoker_invoke_gdbserver
Back to standalone_send_commands
It then sets up no-ack mode and starts sending the commands
Conveniently the command is converted to hex.
Then it reads a packet and checks there is no error.
Back over in Valgrind
handle_gdb_valgrind_command
this converts the string into a number
It then does a poll with timeout, in this case 0, so really just checking for signals
Now, kill
This calls
kill_request ("monitor command request to kill this process\n");
==
VG(umsg) ("%s", msg);
VG(exit) (0);
That's one of the missing strings. It's a kind of prints. I did think that exit might happen before the printf buffer gets flushed.
static void exit_wrk( Int status, Bool gdbserver_call_allowed)
Need to look at that gdbserver stuff
Looks like this calls
VG_(gdbserver_exit)
there's a lot of stuff happening there
finally, syscall exit
I think that valgrind must be exiting before vgdb sends v.kill
Hust adding some echos to the invoker script:
--- mcinvokeWS.stdoutB.exp 2021-12-11 13:51:21.127534000 +0100
+++ mcinvokeWS.stdoutB.out 2022-06-17 08:24:12.446625000 +0200
@@ -18,6 +18,7 @@
gdbserver: continuing after wait ...
gdbserver: continuing in 0 ms ...
gdbserver: continuing after wait ...
+killing vg with vgdb
gdbserver: continuing in 0 ms ...
gdbserver: continuing after wait ...
-monitor command request to kill this process
+killed vg with vgdb
As noted above the two commands are
wait 0
kill
we're seeing the wait 0 output but not the kill
I presume that proves the valgrind hasn't terminated before the wait 0 command.
What is happening next? Is valgrind terminating before the kill, or is the kill being received and valgrind terminating before output gets printed?
On x86
== 23 tests, 0 stderr failures, 0 stdout failures, 1 stderrB failure, 2 stdoutB failures, 0 post failures == gdbserver_tests/mcinvokeWS (stdoutB) gdbserver_tests/mcinvokeWS (stderrB) gdbserver_tests/nlcontrolc (stdoutB)
nlcontrolc is expected
mcinvokeWS is missing
-monitor command request to kill this process from stdout and
-readchar: Got EOF -error reading packet
from stderr
If I run vgdb with -d -d -d -d then I see
sending command v.kill to pid .... 23:54:08.902231 writing writing hex command to pid len 22 $qRcmd,762e6b696c6c#60 notify: 1 23:54:08.902252 reading static buf readchar 23:54:08.902331 read static buf readchar $O3d3d32353531353d3d206d6f6e69746f7220636f6d6d616e64207265717565737420746f206b696c6c20746869732070726f636573730a#2e 23:54:08.902354 getpkt ("O3d3d32353531353d3d206d6f6e69746f7220636f6d6d616e64207265717565737420746f206b696c6c20746869732070726f636573730a"); [no ack] 23:54:08.902381 reading static buf readchar 23:54:08.902571 read static buf readchar readchar: Got EOF 23:54:08.902601 [getpkt: discarding char ''] error reading packet 23:54:08.902627 nr received signals: sigint 0 sigterm 0 sighup 0 sigpipe 0 23:54:08.902659 joining with invoke_gdbserver_in_valgrind_thread 23:54:08.904560 after waitpid pid 25515 p 25515 status 0x........ WIFEXITED 0
Two of the missing lines are there
And there's no stdout diff.
So it seems as though this TC is a bit racy on my i386 VM.
The guest is run with
1 10000000 0 -S-S-S-S
That's just 1 loop with a sleepms of 10 million (about 3 hours). No burner.
If ptrace is interrupting syscalls, at least main will leave its select() and then be in pthread_join(). If these also get interrupted main could end before vgdb has a chance to kill it.
Need ktrace.
ktrace not that helpful.
I see the ptraces, indirectly. One select() gets interrupted (main I guess) and then I see 9x pthread_joins (well, _umtx_ops) being interrupted and restarted. I guess there's enough libthr wrapping to detect and restart.
That doesn't look racy though.
So what exactly does "v.kill" do in vgdb?
As a reminder the command is
./vgdb "$@" -c v.kill
more precisely
./vgdb --vgdb-prefix=./vgdb-prefix-mcinvokeWS --wait=60 -c v.wait 0 -c v.kill
--wait controls the effort to find the FIFOs. It's the number of times vgdb will scan the directory for the FIFOs, with a 1s delay between each attempt.
--vgdb-prefix prefix of FIFOs to look for
-c commands
pretty cheesy command line parsing here
when a -c is seen an entry is added in 'commands'. This is a VLA allocated in main of char*s of size argc. The entry is just a 1 char array initialized to '\0'
then when any other string is encountered the last 'commands' entry is realloc'd and the string added this will be repeated with space separation
so we will have 2 entries in 'commands': "v.wait 0" "v.kill"
After the command line parsing, vgdb will either setup gdb relay (presumably to send monitor commands) or just directly send the comands. I think that vgdb then creates a thread:
if (max_invoke_ms > 0 || cmd_time_out != NEVER) pthread_create(&invoke_gdbserver_in_valgrind_thread, NULL, invoke_gdbserver_in_valgrind, (void *) &pid);
Then vgdb sends wakeup messages down the pipe, with half second delays. Is the write blocking? Don't think so.
Meanwhile the thread is watching the amount of data written by vgdb but not read by valgrind, it has a lot of timer stuff as well,
Note
and
/ Convenience macros for operations on timevals. NOTE: `timercmp' does not work for >= or <=. /
define timerisset(tvp) ((tvp)->tv_sec || (tvp)->tv_usec)
define timerclear(tvp) ((tvp)->tv_sec = (tvp)->tv_usec = 0)
define timercmp(a, b, CMP) \
(((a)->tv_sec == (b)->tv_sec) ? \ ((a)->tv_usec CMP (b)->tv_usec) : \ ((a)->tv_sec CMP (b)->tv_sec))
maybe that should be ! timercmp <
basically I think that this is saying "data is waiting and I'm timing out" so it calls
invoker_invoke_gdbserver
Back to standalone_send_commands
It then sets up no-ack mode and starts sending the commands
Conveniently the command is converted to hex.
Then it reads a packet and checks there is no error.
Back over in Valgrind
handle_gdb_valgrind_command
this converts the string into a number
It then does a poll with timeout, in this case 0, so really just checking for signals
Now, kill
This calls
kill_request ("monitor command request to kill this process\n");
==
VG(umsg) ("%s", msg); VG(exit) (0);
That's one of the missing strings. It's a kind of prints. I did think that exit might happen before the printf buffer gets flushed.
static void exit_wrk( Int status, Bool gdbserver_call_allowed)
Need to look at that gdbserver stuff
Looks like this calls
VG_(gdbserver_exit)
there's a lot of stuff happening there
finally, syscall exit
I think that valgrind must be exiting before vgdb sends v.kill
Hust adding some echos to the invoker script:
--- mcinvokeWS.stdoutB.exp 2021-12-11 13:51:21.127534000 +0100 +++ mcinvokeWS.stdoutB.out 2022-06-17 08:24:12.446625000 +0200 @@ -18,6 +18,7 @@ gdbserver: continuing after wait ... gdbserver: continuing in 0 ms ... gdbserver: continuing after wait ... +killing vg with vgdb gdbserver: continuing in 0 ms ... gdbserver: continuing after wait ... -monitor command request to kill this process +killed vg with vgdb
As noted above the two commands are wait 0 kill
we're seeing the wait 0 output but not the kill
I presume that proves the valgrind hasn't terminated before the wait 0 command.
What is happening next? Is valgrind terminating before the kill, or is the kill being received and valgrind terminating before output gets printed?