Closed nico-abram closed 3 years ago
The code in the repository seems to be outdated, since the last published version is from 1/23/2020 and the last commit from 30/08/2019. Despite that, I tried to locally compile the code in this repo and reproduce the issue. At least on my machine, I can reproduce it.
I know pretty much nothing about both dtrace the tool and the dtrace codebase, so I'm likely making mistakes here, but I can try to debug it.
It's stuck in this loop forever https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/cmd/dtrace.c#L2126-L2166 since dtrace_work
never returns DTRACE_WORKSTATUS_DONE
(Always DTRACE_WORKSTATUS_OKAY
)
dtrace_work
calls dtrace_status
to get its return value. From what I've seen, it's always returning DTRACE_STATUS_OKAY
from this if: https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/lib/libdtrace/common/dt_work.c#L168-L169
These status values are defined here: https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/lib/libdtrace/common/dtrace.h#L217-L221
I think the one that should be returning there is the EXITED case, and the only return for that in that function is this one: https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/lib/libdtrace/common/dt_work.c#L161-L166
So my guess is .dtst_exiting
is not being set correctly somewhere else.
It seems that's the only place in /lib/ or /cmd/ that dtst_exiting
is used. Assuming /sys/ is kernel code, I'm not sure there's much I can do to look into this further
Note: I had trouble getting debug symbols to work for functions in the dll project. Looking at the VS debugging modules window showed that it was actually loading the System dtrace.dll . I manually copied the lib/dtrace.dll file over to the /cmd/ build output folder and only then did symbols start working (Since it was loading the correct dll)
From some simple grepping, it looks like these are the only 4 places that set dtst_exiting
:
https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/sys/uts/common/dtrace/dtrace.c#L18040-L18041
https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/sys/dev/dtrace/dtrace_ioctl.c#L792-L793
https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/sys/compat/win32/dtrace.c#L19894-L19895
https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/sys/compat/win32/dtrace.c#L18662-L18663
All four seem to rely on dts_activity
being DTRACE_ACTIVITY_DRAINING
I only see dts_acitivty being set to that value in two places:
https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/sys/uts/common/dtrace/dtrace.c#L15222
https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/sys/compat/win32/dtrace.c#L15788
Both of these are in functions called dtrace_state_stop
. At a glance, I would guess that function is either not being called when it should, or this early return that both functions have is being triggered
https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/sys/compat/win32/dtrace.c#L15779-L15781
That dtrace_state_stop
function seems to only be called here:
https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/sys/compat/win32/dtrace.c#L19942
(There is another call here but it's in code that does not get compiled for windows because of #ifdefs)
That's in dtrace_ioctl
in what I think is the dtrace kernel driver (In /sys/), and within a DTRACEIOC_STOP
case in a switch
The only place where I see DTRACEIOC_STOP
used (Outside of cases and #defines) is here, in dtrace_stop
in the dtrace command line application code:
https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/lib/libdtrace/common/dt_work.c#L251-L252
That dtrace_stop
function seems to be called in 2 places: The loop we're stuck in mentioned in the earlier comment and in dtrace_status
Looking at the if in the loop in dtrace.c ((done || g_intr || (g_psc != 0 && g_pslive == 0))
), g_intr or (g_psc != 0 && g_pslive == 0)
seem interesting. Looking around, I think g_psc
is the number of processes that dtrace was tracing initially, and g_pslive
the number of those that are still alive (So basically, if we were tracing specific processes such that g_psc != 0, and they are all dead i.e g_pslive == 0 then we stop tracing). Debugging dtrace.exe, it looks like even after the target process exits g_pslive
remains 1, and g_psc
is correctly 1. I think that might be the bug, g_pslive is not being decremented. It's strange because dtrace: pid 23200 has exited
gets printed to the console, so some part of dtrace clearly knows the process exited
Grepping for that message, I get to this line: https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/cmd/dtrace.c#L876
I don't see g_pslive--
anywhere in the windows code for the proc handler, but I do see it in other ifdefs. So I think that might be the bug (Or at least related to it)
Replacing these lines: https://github.com/microsoft/DTrace-on-Windows/blob/b51963db66f1112e9876fd0417aa53f1ccaea9a6/cmd/dtrace.c#L872-L877
With these:
if (GetExitCodeProcess(h, &status) && (0 != status)) {
notice("pid %d exited with status %d\n",
pid, status);
} else {
notice("pid %d has exited\n", pid);
}
g_pslive--;
Seems to fix the problem, but I haven't tested it much
This is a duplicate of this feedback hub issue
I ran into this problem when trying to run cargo-flamegraph, this is the corresponding issue on it's repo with some context.
The following command showcases the problem:
Note: this command is meant for powershell Command for cmd.exe:
version: Published 1/23/2020 on https://www.microsoft.com/en-us/download/details.aspx?id=100441 .
dtrace -V
printsdtrace: Sun D 1.13.1
windows version: 20H2 19042.844