microsoft / vstest

Visual Studio Test Platform is the runner and engine that powers test explorer and vstest.console.
MIT License
898 stars 322 forks source link

Tests hang from dotnet test #2080

Closed JamesNK closed 3 months ago

JamesNK commented 5 years ago

Steps to reproduce

Tests run on travis CI via dotnet test now intermittently fail. Failures started after updating to a newer .NET Core SDK. It appears that the test tool increased from 16.0.1 to 16.1.1 with the new SDK.

Source code: https://github.com/grpc/grpc-dotnet/commits/master

Expected behavior

Tests run and exit

Actual behavior

Tests hang and the build is terminated

Diagnostic logs

Failure: https://travis-ci.org/grpc/grpc-dotnet/builds/551562232?utm_source=github_status&utm_medium=notification Microsoft (R) Test Execution Command Line Tool Version 16.1.1

Success: https://travis-ci.org/grpc/grpc-dotnet/builds/551058627?utm_source=github_status&utm_medium=notification Microsoft (R) Test Execution Command Line Tool Version 16.0.1

mayankbansal018 commented 5 years ago

@JamesNK , if this is an intermediate issue, can you please enable diagnostics logs for testplatform & share those with us https://github.com/Microsoft/vstest-docs/blob/master/docs/diagnose.md

JamesNK commented 5 years ago

I can't easily get logs off the build server but I can repo freezing on my dev machine with this. It freezes all the time within 5 minutes:

while($true) { dotnet test --diag:log.txt }

testlogs.zip

JamesNK commented 5 years ago

This is an ongoing problem. I'd like to make some progress on fixing it.

Do you need anymore information from me?

mayankbansal018 commented 5 years ago

@JamesNK I went through the logs & from the logs I didn't see any hang state. The only interesting thing I observed was that it seemed we start 10 different testhost processes in sequence, but you have shared logs for 30 testhost process. Can you please share how many test dlls are you running ?

JamesNK commented 5 years ago

Because I'm doing it in a loop: while($true) { dotnet test --diag:log.txt }

In the example of the logs I attached it freezes after 3 runs.

Have you tried reproing it?

mayankbansal018 commented 5 years ago

I'm looking into it, I've cloned the repo, & all I need to do is run dotnet on sln right?

mayankbansal018 commented 5 years ago

I tried it locally multiple times, but it did not repro for me.

tasadar2 commented 5 years ago

We are currently experiencing the same issue. We have been using mcr.microsoft.com/dotnet/core/sdk:2.2.204 to avoid the performance degradation issue which is now resolved. But when attempting the latest mcr.microsoft.com/dotnet/core/sdk, currently b4c25c26dc73f498073fcdb4aefe167793eb3a8c79effa76df768006b5c345b8, only a couple test runs finish while the rest seem to hang. As with the performance issue, it seems to be related to non-interactive hosts.

Situation

Replication I condensed our project to share an example. Each test project has a single test that sleeps for 5sec. Repo: https://github.com/tasadar2/vstest-issue-2080 CircleCI: https://circleci.com/gh/tasadar2/vstest-issue-2080/3

It doesn't always replicate the issue, but often does.

JamesNK commented 5 years ago

~Running test projects individually has fixed this problem for us - https://github.com/grpc/grpc-dotnet/commit/152255ec5419c1360819788d7911f4957c8e4e2c~

singhsarab commented 5 years ago

@tasadar2 @JamesNK sdk:2.2.301 has a fix, can you please check if you are still hitting the issue ?

tasadar2 commented 5 years ago

Still appears to be an issue. image: mcr.microsoft.com/dotnet/core/sdk@sha256:a50e175acd618c3e90bc91dceb5194e6c3764c5b4d179390cef874a887476ba9 example: https://circleci.com/gh/tasadar2/vstest-issue-2080/7

JamesNK commented 5 years ago

I've narrowed down my hanging issue. It is caused by something with how vstest writes to the console. If no tests fail then vstest completes without any problems. If a test fails then vstest hangs until the CI build times out (this is in Travis CI)

The workaround I am using is to write the test output to a text file, and the write the text file to console. If I do that then it never hangs.

singhsarab commented 5 years ago

@JamesNK What is the dotnet sdk version you are using ? Can you share the link to the CI ?

JamesNK commented 5 years ago

3.0.100-preview8-013532

https://travis-ci.org/grpc/grpc-dotnet/builds/565363470?utm_source=github_status&utm_medium=notification

tasadar2 commented 5 years ago

Looks like this is still an issue on the recent 3.0. image: mcr.microsoft.com/dotnet/core/sdk@sha256:3afea8958440231a77b3daea267951cc8ba9026fc1015bcbccc206d6f1d031f7 example: https://app.circleci.com/jobs/github/tasadar2/vstest-issue-2080/10

singhsarab commented 5 years ago

@tasadar2 Can you please try to use --logger:console;noprogress=true argument and check the issue reproduces for you ?

tasadar2 commented 5 years ago

That arg produces the same results, https://app.circleci.com/jobs/github/tasadar2/vstest-issue-2080/11

tasadar2 commented 5 years ago

Though, when quoting the argument value, that seems to work --logger:"console;noprogress=true" https://app.circleci.com/jobs/github/tasadar2/vstest-issue-2080/14

NicolasDorier commented 5 years ago

I had the same problem, two workaround worked:

Adding --logger:"console;noprogress=true" like @tasadar2

However, I did not like it because I could not see the progress (--logger:"console" has same issue), so instead added < /dev/null

dotnet test .... < /dev/null

This allow me to see the progress of tests, without hanging.

image

NicolasDorier commented 5 years ago

Full logs of a successful build with feedback https://circleci.com/gh/dgarage/NBXplorer/430 before the hack you can see things stalling https://circleci.com/gh/dgarage/NBXplorer/409

Happening on mcr.microsoft.com/dotnet/core/sdk:3.0.100 xUnit.net VSTest Adapter v2.4.1 (64-bit .NET Core 3.0.0)

lukebakken commented 4 years ago

@NicolasDorier thank you for finding your workaround. I can confirm it works, please see rabbitmq/rabbitmq-dotnet-client#750

erick-thompson commented 4 years ago

Is there a longer term fix for this? I'm running into the same issue with a set of web integration tests.

NicolasDorier commented 4 years ago

Ping @mayankbansal018 please remove the label, there is enough information and this issue and workaround have been tested by numerous people. This is a bug that need to be fixed and well identified.

nohwnd commented 4 years ago

@JamesNK So far it looks like it is caused by setting color at the same time as setting cursor position. https://github.com/microsoft/vstest/issues/2282 Exploring strategies how to avoid that. Probably the best is to write the progress in a way similar to @NicolasDorier comment above. But ideally write dots at the end of the line, so the output is not affected that much.

Experimenting with it here, and also trying to get it to lock up by moving cursor on the vstest-like branch. See the other thread for some more info.

https://travis-ci.org/github/nohwnd/ProgressToy/branches

tremblaysimon commented 4 years ago

Only to confirm that we get that problem in Concourse if and only if dotnet test executed on a solution file.

keshavkaul commented 4 years ago

@NicolasDorier Any suggestions for windows powershell?

NicolasDorier commented 4 years ago

I don't remember having the issue on powershell.

dhilst commented 3 years ago

I had this problem with a small vm (2 cores, 2GB ram), increasing to 8cores and 8gb ram make it work. I read at some place that this was thread related so I came up with the more cores idea. The < /dev/null workaround didn't help in my case.

diegosasw commented 3 years ago

Not sure if it's related. I'm also a GitLab user and found a very similar problem so I suspect it's probably related to the SDK image, but in my case happens with Microsoft (R) Test Execution Command Line Tool Version 16.9.1 and the workaround didn't work for me. https://forum.gitlab.com/t/dotnet-test-hangs-in-gitlab-gitlab-runner-13-9-0-rc2/50977

I have also checked whether it has anything to do with async calls in case there is a deadlock, but I seem to be awaiting properly everywhere.

diegosasw commented 3 years ago

Out of curiosity, is anybody experiencing this only when using IHostedService? In my case it seems that's what causing the tests to freeze within a linux image and disabling parallelization avoids the deadlock.

tremblaysimon commented 3 years ago

@diegosasw, for me it's happening when using dotnet test on a solution file even if there is no IHostedService usage.

victor-fialkin-deltatre commented 2 years ago

This helped us in case of IHostedService: https://www.strathweb.com/2021/05/the-curious-case-of-asp-net-core-integration-test-deadlock/

gdoron commented 2 years ago

@Evangelink @davidfowl Even on our very powerful Macbook pros, when we are running one of our integration test projects that has hundreds of tests, many of which use WebApplicationFactory we often get into deadlocks.

We set xunit to use unlimited amount of threads (-1), we have overridden a bunch of xunit stuff to implement a semaphore to limit the number of concurrent tests. And yet, often we get into deadlocks after the ~600 tests completed mark. After verifying we have no sync-over-async, we found this beauty in asp.net core code itself...

image

https://github.com/dotnet/aspnetcore/blob/main/src/Hosting/TestHost/src/TestServer.cs#L101

Can this be the root cause for all our deadlocks only in Unit tests problems?

gdoron commented 2 years ago

Finally! I managed to reproduce it in a small test project, where I can do dotnet-dump and attach to process without everything crashing.

I found this fella in one of the threads' callstack.

image

And going up the callstack we can see it's inside (or trying to get inside and the debugger is misleading) a lock block of NLog.

image
davidfowl commented 2 years ago

Can you file this issue on dotnet/aspnet

gdoron commented 2 years ago

Your request is my command: https://github.com/dotnet/aspnetcore/issues/43353

JBoothUA commented 2 years ago

just in case this helps someone, at a random time this line started randomly hanging our tests

_task = Task.Factory.StartNew(async () => { await Task.Delay((int)cleanInterval.TotalMilliseconds, _cancellationTokenSource.Token); --> while (!_cancellationTokenSource.Token.IsCancellationRequested)

Evangelink commented 1 year ago

@cvpoienaru Please investigate this issue.

Piedone commented 1 year ago

Did anything happen in the investigation by chance?

diegosasw commented 1 year ago

I am experiencing the same when using TestServer. It works well locally, when running in GitLab runner, it hangs. I've noticed it gets frozen when trying to dispose TestServer because the disposing of IWebHost hangs. No exceptions thrown.

I've tried to stop all the IHostedService in case that's the problem, but still unable to dispose TestServer to see if that solves the problem.

Could I know how are you troubleshooting this? The only workaround i have is to set

#if DEBUG
[assembly: CollectionBehavior(CollectionBehavior.CollectionPerClass, DisableTestParallelization = false)]
#else
[assembly: CollectionBehavior(CollectionBehavior.CollectionPerClass, DisableTestParallelization = true)]
#endif

in the test assembly, so that it runs sequentially in CI/CD pipeline.

nohwnd commented 1 year ago

@cvpoienaru Please investigate this issue.

psoladoye commented 1 year ago

This issue is still unresolved.

nohwnd commented 3 months ago

This issue is a mix of different problems, some related to other products, but I did not find a clear repro. If someone is still experiencing this problem and has a simple repro, please file a new issue.