microsoft / vstest

Visual Studio Test Platform is the runner and engine that powers test explorer and vstest.console.
MIT License
870 stars 316 forks source link

Updating to from .NET 8 SDK to .NET 9 SDK Preview 4 causes `dotnet test` to hang forever #5091

Open Youssef1313 opened 2 weeks ago

Youssef1313 commented 2 weeks ago

Description

So far, I don't have an isolated repro. But it happens for Uno Platform Wasm UI tests that we execute via dotnet test. The tests are passing, but dotnet test isn't terminating.

Upon investigation, I found:

image

image

So it looks like somehow, the VSTestTask2 isn't terminating. It's stuck there forever. Setting MSBUILDENSURESTDOUTFORTASKPROCESSES environment variable to 1 does the trick for now.

Steps to reproduce

We haven't yet got a minimal repro.

Expected behavior

Actual behavior

Diagnostic logs

logs.zip

Environment

nohwnd commented 2 weeks ago

vstesttask2 task starts an exe, and waits for it to exit. It will sit there as long as the exe will be running. When you look in test explorer do you see vstest.console / dotnet running under this process? Do you see also testhost running under vstest.console process?

Youssef1313 commented 2 weeks ago

vstesttask2 task starts an exe

Is it the testhost.exe? That one terminates correctly

Youssef1313 commented 2 weeks ago

I have a dump of the dotnet process, in case that can help, I'll send it to you.

nohwnd commented 2 weeks ago

I've got your dump, it looks like the tool task is simply waiting for a child process to exit. The child process is vstest.console.

In the logs of vstest.console I can see it exited, but I also see that the Process ID is different from what is in the dump file, so this is probably from 2 different runs, not a big problem, but could you double check that vstest.console is stopped under the task? if you put long wait in your test, you should see one under some MSBuild node, and then it should exit.

You could also try using -nodereuse:false, that will disable using "cached" MSbuild nodes, and will start a new one for this run, if it still stays stuck it makes it much easier to see what process is stuck, because they all run under the terminal process.

Youssef1313 commented 2 weeks ago

@nohwnd Oh. The logs I sent to @Evangelink were different run than when I took the dump, I think. However, when I was seeing the task is waiting for a process, I couldn't find the process id at all in task manager, so it was strange it's waiting for a process that already exited somehow.

I'll try to delay the test and see if I can find more information.

nohwnd commented 2 weeks ago

That is indeed weird, and in that case it would be a MSBuild bug (not that I am trying to ditch responsibility, but we are fully relying on ToolTask to do this). Let me know what you found and I will talk with msbuild team if there is problem in tooltask.

Youssef1313 commented 2 weeks ago

Great. I'll double check my analysis and try to get more info and get back to you

tmds commented 1 week ago

We ran into this issue in https://github.com/dotnet/sdk/pull/41198 CI.

The CI test step would time out after 30 min on each attempt. After adding MSBUILDENSURESTDOUTFORTASKPROCESSES=1 the test step finishes in less than 4 min.

cc @ViktorHofer @rainersigwald @Forgind

Forgind commented 1 week ago

On a hunch, can someone try setting MSBUILDNODEWINDOW to 1 to see if that also resolves the problem?

Forgind commented 1 week ago

Actually, I guess I can do that. I'll try to get that started later today.

nohwnd commented 1 week ago

This is now on list of work for msbuild team, and me to fix. We still don't know where it is happening though. So if you have any additional info, or repro it would be very welcome. Especially double checking if vstest.console is or is not running while the hang is observed. And diagnostic logs of dotnet test.

MichalPavlik commented 5 days ago

There is a thing with WaitForExit() method when parent process reads stdout asynchronously. If there is a grandchild process started by child process, WaitForExit() of the parent process waits for exit of the grandchild. It blocks even when the child process exits.

I'm not saying it's the root cause in this situation, but it's possible. Our ToolTask uses WaitForExit(), so I can try to avoid this situation on our side.

Youssef1313 commented 5 days ago

Not sure if that would be related to https://github.com/dotnet/runtime/issues/103384

MichalPavlik commented 4 days ago

What I described relates to issue you mentioned. @Youssef1313, could you please try to find if there is a process that was started by the testhost and terminate it? If it unblocks the MSBuild, then the problem is in our codebase and should be fixed. The workaround is to use different overload of WaitForExit method.

Youssef1313 commented 4 days ago

I may not be able to re-test this soon-ish, but IIRC, when WaitForExit was stuck, I wasn't able to find a matching process id that's open. So it felt like that process already terminated but WaitForExit was still blocking and didn't return.

MichalPavlik commented 4 days ago

Yes, if testhost started another process with redirected output, then our WaitForExit will wait for the grandchild process to exit. So far I don't have another idea what is happening.