matu3ba / sandboxamples

Structured collection of sandbox programs including tests (fs, net access, permissions, process groups [if available]) and system setup programs. No VM stuff.
BSD Zero Clause License
0 stars 0 forks source link

windows design flaw: no reliable notification about process tree termination #10

Closed matu3ba closed 3 months ago

matu3ba commented 3 months ago

See https://stackoverflow.com/questions/28025869/reliable-waiting-for-process-tree-completion and https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-jobobject_associate_completion_port?redirectedfrom=MSDN

Note that, except for limits set with the JobObjectNotificationLimitInformation information class, 
messages are intended only as notifications and their delivery to the completion port is 
not guaranteed. The failure of a message to arrive at the completion port does not necessarily 
mean that the event did not occur.

So it looks like there is no really reliable way to get process tree completion notifications in Windows. One has to use polling, for example via https://learn.microsoft.com/de-de/windows/win32/api/jobapi2/nf-jobapi2-queryinformationjobobject JOBOBJECT_BASIC_ACCOUNTING_INFORMATION.

With admin privileges and ETW, see for example https://stackoverflow.com/questions/28882178/how-can-i-capture-process-names-using-the-traceevent-library suggesting to use process id, one can write an own task manager to track process relations (examples in https://www.nuget.org/packages/Microsoft.Diagnostics.Tracing.TraceEvent.Samples/). But that is pretty much overkill.

A workaround with known process names is to wait for them to terminate, but there might be identically named processes spawned in other job units we are waiting for leading to potential deadlocks, if both wait for another on the same system.

matu3ba commented 3 months ago

The correct solution is to detect, if at any point the I/O completion buffer is full at any point and on doubt to (temporarily) fallback to polling the process count via JOBOBJECT_BASIC_ACCOUNTING_INFORMATION in QueryInformationJobObject. However, QueryInformationJobObject only works, if the job object was invoked via either WaitForObject[Ex] or asking for element of I/O completion port.

This sounds like a callback is invoked to execute pending actions on the job object only in these cases and TerminateJobObject only enqueues actions to take on the job object. The documentation of job objects is very unspecific about this.

matu3ba commented 3 months ago

workaround exists, so closing.