Closed rdghickman closed 5 years ago
We hope to reimplement timeout and other features that were dropped in the PCL when we add a netstandard version of the framework. They were dropped because timeout requires threads which aren't available in PCL.
Is there any timeline for enabling TimeAttribute and DelayedConstraint for the netstandard version? We're currently porting our libraries to target both .NET Framework and .NET Core but our test suites make quite heavy use of DelayedConstraint in particular such that porting them to support both platforms would be a pain.
The first step of porting to .NET Standard is done. Hopefully we can read the thread bases attributes Q1
Edit: ignore the below - I missed the ThreadUtility class, which can't be converted straight off.
I was contemplating having a look at reimplementing these things in a Taskbased, manner, but then found System.Threading.Thread is actually available for .NET Standard 1.3.
https://www.nuget.org/packages/System.Threading.Thread/
Actual cross-platform support is apparantley a little sketchy, but not in a way that I think would affect us - our UWP runner will be using the PCL build (or subsequent netstandard 1.0) for the forseeable.
Would you be happy with us pulling in this package, or was this something you specifically didn't do? I can PR, at some point.
I was planning on pulling in the threading package as we start to tenable functionality. What about Thread utility isn't convertible?
Thread.Abort() and such isn't available in .NET Standard. (I don't think?) This is used to kill the thread used for timeouts.
DelayedConstraint is works with System.Threading.Thread - I could push that as a separate PR, and then revisit something Task based for Timeouts?
The reason I didn't want to convert everything to be task based, is that they're not available pre-.NET4, so it would require two separate implementations for pre and post NET4. Hence why I think pulling in the package is good for the thread usages in DelayedConstraint, instead of converting that to use Task.Delay()'s instead of Thread.Sleep(). Thoughts?
https://github.com/abbotware/nunit-timeout <- sample code for creating a .Net Core Timeout attribute
btw, if there was a ExecuteAsync on the TestExecutionContext, this could easily be converted into Cooperative Cancellation variant.. but the [Test] method would need a new variant that accepts a CancellationToken
I'd have a preference for having our test method command inject a CancellationToken versus spinning up new threads or tasks for tests with timeouts and letting them leak. For example, what if a spin wait goes bad and pegs a CPU for the remainder of the test run?
Also, if we spin up a new thread and leave it running, how does this interact with [NonParallelizable]?
if a test is 'hung' - all bets are off - I think getting partial results is better than no results - worse case add a special overload to the timeout(100, AbortTestRun=true) <- that means if this timeouts - Program.Exit()
Co-op is the cleanest technique. I think we'd all be on the same page here: the I/O work is canceled, resources are disposed, etc.
The next cleanest is directly implementing the timeout yourself- for example, Assert.That(process.WaitForExit(10_000));
instead of process.WaitForExit()
, or the task.WithTimeout
extension method I seem to have written six times this year.
Beyond that it gets tricky, because the next cleanest thing to do is flush the current test results back to the runner and kill the process. This would require careful collaboration between the runner and the framework. So moving on from that, the next cleanest thing is to leave the thread running for the rest of the lifetime of the host process. I actually think that's preferable to aborting the thread, since global state can be corrupted to an arbitrary degree due to so much code having no need to be hardened against thread aborts, and you risk affecting the other tests.
We don't want you to lose your test results. Can we have some specific examples of code that hangs tests, where you can't work around it with using (var cancel = new CancellationTokenSource(10_000))
and passing the token, or Assert.That(foo.Wait(10_000))
?
I am not arguing that Co-op cancel is cleanest... Its just not realistic ... Not all code in the world is on the new Async model (nor does it follow it correctly even if it is.. ie 'Bad Actor' scenario)... Not all code in a unit test is under control by the writer of a test....
I am suggesting the following:
[Test]
[Timeout(1000)]
Task [async] TestMethod(Cancellation Task)
//use version 1, fallback to version 2
Caveat: you can't provide Cancelletation Token to these types of tests
[Test]
[Timeout(1000)]
Void TestMethod()
//use version 2
[Test]
[Timeout(1000)]
Task [async] TestMethod()
//use version 2
@abbotware I agree that tests need to opt in to cooperative cancellation. How they signal that is a matter of API design.
We can't however default to leaking test threads under NUnit control. If you mean threads started by the tests or the SUT, then yes, that would make sense.
So what happens when a test is stuck.... ? you lose all the results of the test run
Do you mean stuck in the sense that we can't even kill it? I think that's something to deal with if there is a use case where it can happen without the test writer explicitly trying to make it happen.
I know several tricks that will make a thread refuse to Abort. But if I use those tricks in my test, it seems like the failure to cancel is my own fault. Doctor says: "Stop doing that!"
Also... in the NUnit design, if we leak the hanging thread, we have lost one of the test worker threads. If it happens a few times, we lose all those threads and no more tests get executed. That's why I said we can't just "leak" our own threads but can leak threads created by our users or by the code they are testing.
For threads that are particularly at risk, you can make NUnit dedicate a thread to the test, separate from the worker threads. That thread could be safely leaked.
seriously? Doctor says: "Stop doing that!" -> what is wrong with this group (Nunit) - you are not the first person to make such an assumption that I have control over all code in question.. I am not going to repeat everything i wrote here; (https://gitter.im/nunit/nunit)
you should assume a test will hang (murphy's law) - not that all code will do this.. but maybe a 3rd party library gets updated... and some new behavior happens... I want to be able to run my tests and see what is impacted... isnt that the point of unit tests / integration tests?
@abbotware Sorry you are annoyed. I don't use Gitter, so didn't see your comment. However, I looked at it just now and it didn't seem to explain what you mean, just stated that you don't have control. Sometimes, people don't undertand one another because they simply come from different backgrounds, have different meaning for terms or hold different values. Getting mad doesn't make the communication work any better.
My own comment pertains to your test code and to nothing else. If a library has an issue, I think you should write your test code so as to deal with that issue. If this turns out to be a problem that many NUnit users need to deal with independently, then it's a candidate for a feature. But there is no open source (or other) project that entertains features for unexplained needs, at least not as far as I know. If we don't understand it, we try to understand first.
I don't know your background, of course. But the base assumption of NUnit (and pretty much all of the xUnit patterned frameworks) is that your test code is under your control, while the code you are testing may or may not be. In fact, there are three levels of code involved: test, SUT and third-party libraries. In the ideal situation, you have control over the first two, because you are writing the SUT as well. But sometimes that isn't so. By definition, of course, you don't have control over third-party libraries but you might have control over how you use them.
I'm spelling this out because we see many more people these days who say they do not have control over (even) the tests. This is a new audience for me, and I'm not sure I know how to deal with it. If that is your situation, my apologies. But you have posted enough here that I'm led to assume you are in fact a software developer with a fair amount of experience. If you are willing to explain why or how you don't have control over your tests, I want to hear more because it may help us deal with other people who are not able to articulate the problem as well as you.
@abbotware So...
That's a very specific issue. Thanks.
An approximate solution is to see what test was running when the crash occured. Depending on how you run tests (console, VS, etc.) there are ways to find that out. The solution is approximate, however, because a test can start a thread that impacts NUnit long after that test terminates.
We have always had this issue - i.e. for the 18 years of NUnit's existence - but it becomes more prominent as more people use multi-threading and as NUnit itself runs in parallel.
Existing solutions:
None of them are 100% satisfactory, so what would you like to see beyond them?
Finally, does this actually relate to the non-availability of TimeoutAttribute under .NET Core or is it a separate issue?
You can rest assured that if i see a [TestTimeout] result... I'll be doing a lot of digging around to find out WHY that occurred... but as it is now.. I can't even get that result if the testrunner 'hangs' on ,Net Core...
What is this 'thirdparty' that I am describing.. well.. this might blow your minds.. but its the .Net Core ITSELF Yes... it is buggy and not at all 'perfect' on linux/arm processors - so I REALLY do need this feature
So when i switch from net core 2.0 to 2.1 I need to be able to see if someone in the coreCLR team didnt 'sh*t the bed' :-)
Why is everything I am doing so critical: The software I write runs for 24hrs a day/weeks/months on end with 0 memory or thread leaks under "normal conditions" <- that quotes are there because lets face it... "things happen" - so just like how I implemented TestTimeout - you need to make a sacrifice over being 100 % purist (something i consider idealistic zealotry) vs real world reality.
If you ask any one of your users whether they want 'No Results' because 1 test out of 1000's caused a hang... or 'Partial Results' - anyone reasonable person is going to choose the later.
Even if it means there those partial results come with a big 'asterisk' - At least i know Test ABC = Timeout, and every test after ABC says - 'Passed - But you Need to Re-Run because Test ABC = timeout, and we can't make any guarantees about the validity of 'Pass' - accept this result at your own risk - remove or fix the offending test.. and then you'll be good'
A bit pedantic sure, but if you need to CYA then do that.. or make this functionality a feature for the 'advanced users that know what they are doing and have it off by default.
For the 'record' - I am perfectly content with my [Timeout] - I don't need the NUnit group to do anything - I just thought this little nuget of information would serve use to other users that seem to be in the same (or similar) situation I find myself in.
I was going to attempt to contribute the CoOp cancellation method version by forking, digging into the code and submitting a PR... but I am not so sure my views are inline/consistent with the project overall.
@abbotware "This feature" ?
I chimed in because it seemed you were off topic regarding this issue but were saying something interesting. If the feature you want is to have Timeout under .NET Core, I'm not here to argue against you. BUT it seemed you were saying that the existing semantics of Timeout are actually not useful for you, did I misunderstand?
Even if the current semantics are not what you would like, would it not make sense to first implement it for .NET Core and then discuss possible changes?
If i were to implement anything it would my suggestion above: (ie 1. coop cancel, 2. fall back to leak task)
@nunit/framework-team Here's a nice, simple, small issue that says we should implement TimeoutAttribute for .NET Core applications. There's even some code that has been suggested.
I wonder... can't we just fix the actual problem that was reported?
@nunit/framework-team In order to do the thread-leaking implementation, we'd need to decide whether we can treat all platforms the same (i.e. stop doing thread-abort on .NET Framework). Our .NET Standard builds can run on .NET Framework, so if we didn't want to change existing thread-abort behavior, the right thing to do would be feature detection via PlatformNotSupportedException. That would mean, before executing a test with [Timeout], we'd need to detect whether thread abort is possible and spin up a new leakable thread first if not.
Our .NET Standard builds also run on UWP which IIRC doesn't allow us to start new threads, so using Task.Run
we'd be borrowing thread pool threads and (on timeout) never returning them. That doesn't sound terrible, given how rare timeouts should be, though there is the risk of threadpool starvation. This just adds to the complexity of supporting [Timeout] in the .NET Standard builds.
@abbotware I honestly appreciate your feedback. I apologize that my approach was unsympathetic to you. I'm here because I want to make our users' lives easier. Any specific tips would be highly valued!
The dilemma for me is that on one side, this feature seems basic. We don't we just make it work? MSTest has it? Etc. This is a totally reasonable thing to think.
On the other side, I've come to believe (along with the maintainer of xUnit) that Timeout has always been a lie; furthermore, with only bad choices of implementation. I've been an advocate for removing its use in every codebase I've worked in. Besides the complexity of supporting it across all .NET Standard platforms, there is also the downside that folks will simply reach for [Timeout]
when they have much better options.
So to convince myself that adding this to NUnit could be a good thing, I was searching for a use case. I always assume that a request is not an XY problem, but I wasn't able to come up with example scenarios without asking for them, where co-op was impossible or less desirable.
Reading your comments above, you're looking for a way to guarantee that .NET Core itself isn't improperly implemented in a way that introduces a timeout. Is that correct?
@jnm2 This started out as implementing TImeout on .NET Core. The discussion rapidly changed to finding a common implementation across all platforms and very recently changed to implementing cooperative cancellation, which is less than the original request on one dimension and more than it on another.
So... I'm asking, what is wrong with doing the simple implementation of TimeoutAttribute being offered by @abbotware on the platforms that don't currently support it? Will be be worse off or better off than we are now if we do that? Will the requesting users be worse off or better off?
My own answer is that the users who now do not have access to Timeout will have it. It may work a bit differently from the timeout on other platforms, but that doesn't sound like a deal-breaker.
The only context in which I would not see an advantage in implementing Timeout for .NET Core is if the project intends to obsolete and remove TimeoutAttribute on all platforms. But that's absurd, since so many users depend on it in spite of its limitations.
Doing the small thing now does not preclude doing the bigger thing later.
@CharliePoole - I agree the CoOp cancellation is a 'feature request' and out of scope of the original request. What I was proposing addresses the current problem as a simple work around given this is is... 2 years old!
@jnm2 - I still think 'CoOp cancellation' is the way forward, but that definitely a bigger ask. (But I am willing to help!)
Regarding 'platform specific' implementations: I am fairly certain you don't need to detect this via an exception - Instead you know the build config (compile parameters) + runtime config (OS + Runtime) and you can create a feature matrix/lookup. I use System.PlatformID all the time to switch between windows and linux specific code variants at runtime. (you might even look at the reference CLR code to see how it even knows to throw an exception in the first place!)
regarding timeout being a lie - I disagree - if has its value and I think i described my scenario. I know you are saying a 'user' has options - I agree.. However, what is the big deal with adding a feature to the library that mimics what the user does? Its either add a level of 'indirection' via a task or thread in user code, or via framework code. Not all of my tests have a Timeout parameter.. but for those are that do, I know its because it really needs it and I would rather the wrapper task/thread be done consistently than leaving it to my 'potentially buggy user code! (and yes I know I can do extension methods, etc.. ) but still... the test attribute wins over on being the simplest and cleanest approach (from a user's perspective)
@jnm2 - in response to this: "you're looking for a way to guarantee that .NET Core itself isn't improperly implemented in a way that introduces a timeout. Is that correct?"
The fact that my problem is occurring in the CLR should be irrelevant. A dependency (direct or indirect) gets changed.. Code that has any asynchronic (is that a word?), threading functionality, or multi process, can potentially deadlock if it has synchronization. I am referring to lock statements, , waiting conditions, all forms of thread synchronization (mutex,semaphore,etc) ... and nowadays 'tasks' ... can potentially have different behavior if a library changed (or even code). This is the code that I want to test with Timeout attribute.
You can't assume this code is written perfectly (with timeouts) nor that a user that it has correct timeouts everywhere, or even chain cancellations correctly, or that I can even change the code with problems. (FYI I once pointed out a bug in 3rd party code to the library author - The async call did not have a cancellation token - there response was "I Charge for Support" - my only option was just use the sync version at least it had a timeout)
I would argue that CoOp cancellation "is a lie" - There is no way you can guarantee it. You can't. The fact you even said "if you are ignoring the token your doing it wrong" in the first place just proves my point. Everything has to be be in perfect alignment.
This makes a Timeout like I am describing even more valuable to help identify the code with problems.
@jnm2 - "Our .NET Standard builds also run on UWP which IIRC doesn't allow us to start new threads, so using Task.Run we'd be borrowing thread pool threads and (on timeout) never returning them." -
I am no familiar with UWP, but usually If a task is 'stuck' it doesn't 100% mean a lost thread.. At best its a resource / memory leak cause the user code is stuck in a wait somewhere: The whole point of using awaitable code in the first place is so that context switches can happen freeing up a limited number of threads to do more work.
At worst, the code is in userland and doing an infinite "busy loop" <- that is a lost thread.
@CharliePoole
Will be be worse off or better off than we are now if we do that? Will the requesting users be worse off or better off?
I'm willing to be convinced.
@abbotware
I appreciate the offer of help for the co-op issue! We might take you up on it!
I use System.PlatformID all the time to switch between windows and linux specific code variants at runtime. (you might even look at the reference CLR code to see how it even knows to throw an exception in the first place!)
I would try to avoid this in a similar vein as avoiding version-based detection, but on a case-by-case basis of course. If there's something more reliable than triggering the PNSE, let's do it! I'm pretty sure the reference code unconditionally throws.
The fact you even said "if you are ignoring the token your doing it wrong" in the first place just proves my point.
I'm not recognizing this as something I would say. Co-op cancellation is definitely not guaranteed termination. I do think guaranteed termination itself is a lie unless you kill the process.
You can't assume this code is written perfectly (with timeouts) nor that a user that it has correct timeouts everywhere, or even chain cancellations correctly, or that I can even change the code with problems.
I couldn't agree more! Yet, if you are in such a scenario while testing your code, you will also need to solve the same problems when your code is running in production without NUnit. Once you apply the production solution, wouldn't that immediately address the test-time timeout problem?
The larger point being, timeouts triggered by tests are vanishingly rare. You wouldn't allow the deadlock to remain in production code, and as soon as you find it and fix it, you no longer need the [Timeout] attribute. The value [Timeout] brings that I can see is:
This are nice enough benefits that I'd be happy for us to do whatever it takes on each platform to make this diagnostic stage easy. Does this sound right?
I'm not recognizing this as something I would say.
I paraphrased: your exact words were:
"Is that any different than shipping product code which doesn't listen to a cancellation token? If you're causing yourself grief by accepting a cancellation ttoken and doing nothing with it, then... stop doing that"
but lets just move on....
This are nice enough benefits that I'd be happy for us to do whatever it takes on each platform to make this diagnostic stage easy. Does this sound right?
YES 👍
Awesome, let's do this thing! 🎉 Recapping questions:
well - I don't think its desirable to change existing behavior.
As it is now, there is no attribute for Net Standard right? So you are sorta have a clean slate:
Another idea: [Timeout(1000, AbortIfSupported=true)] (but this implies a thread is created per test) - I dont know the internals of NUnit well enough to say whether or not this is even viable
We have a clean slate. After adding TimeoutAttribute
, this scenario becomes possible:
[Timeout]
, depends on common test utility library (net472), which depends on NUnit (pulls the net45 build)netstandard2.0
which pulls NUnit's netstandard2.0
build. [Timeout]
begins leaking instead of aborting, all while executing on .NET Framework.This doesn't seem like a problem to me if everyone is okay with it?
and what about unit test project (netcoreapp) ?
@abbotware ❔ Currently a netcoreapp
project can't compile using [Timeout]
, so there isn't a scenario analogous to the .NET Framework one.
OK - i guess i misunderstood your statement above - i thought you were intending to describe all the scenarios where the new (leaking) Timeout would be introduced.. .. instead you are just saying which current ones would be affected? (btw an exhaustive/explicit list of might need to be created (ie feature matrix)
I dont even know all the scenarios (and the following table might be wrong regarding net452 - I just picked a random 'framework') - but let me get the ball rolling:
NUnit Build | Timeout Abort | Timeout Leak |
---|---|---|
.net452 | x | n/a |
NetStandard2.0 | n/a | (new) |
Unit Test Runtime | Timeout Behavior | Linked Nunit Build |
---|---|---|
.net452 | abort | .net452 |
.net472 | leak (change?) | NetStandard2.0 |
NetCoreApp2.0 | leak (new) | NetStandard2.0 |
@jnm2 - you may probably know more about all the other nunit builds / run-time scenarios than me - so perhaps you can complete the above the tables? It would also help anyone else following this 'feature request' to understand the impact overall to NUnit
I guess though.. it is possible to link a runime of .net472 -> .net452 (or even .net2/35) - but, is that even a recommended? so the table might not be able to include all permutations, just the default (via nuget)
@abbotware Hey, I tried it out and I'm wrong! A whole category of changes that I've been worrying about isn't a thing! In this situation:
Unit test project (net461
–net472
)
└ References: test helper project/package, netstandard1.4
–netstandard2.0
.
└ References: NUnit package
NuGet will still reference the .NET Framework build of NUnit in the unit test project, not the .NET Standard build, even though the unit test project doesn't directly reference NUnit.
In order to see behavior change from aborting to leaking, then, you'd have to have no package references to NUnit and a manual reference a .NET Standard NUnit assembly. This is so rare I'm not concerned. You'll have bigger problems if you're doing that. 😜
@jnm2 - I'm happy to pick this up.
I already have a branch with a working solution, i.e. I use abbotware code as it works. I've added a couple of tests. I might go straight to PR to see where we are as I realise there is a bit of contention.
I've stuck to the basic's....
Great - thanks @DaiMichael, look forward to it!
Afraid GitHub won't allow us to assign the issue to you until you're part of the 'NUnit Organization' on GitHub, but I've removed the help-wanted label.
To fix that: @rprouse - would you be able to send invite Dai to the contributors team? Dai - our contributors team is made up of those who've made a few contributions now and then to the NUnit project - it grants a few very basic permissions, such as allowing us to put your name to the issues you pick up directly, and also allows you to make changes to the docs. Off the back of your last few contributions, you're more than welcome on there, if you'd accept!
uh.. so no attribution for the code I wrote? kinda looks like it was just copy and pasted there...
@abbotware There's no PR yet. What would you like to see if your code is used?
@DaiMichael I have sent an invite to the contributors team, welcome.
@abbotware we don't normally add attributes to individual authors in the code, we generally only provide recognition through the contribution graph for the project for people that contribute PRs. We could add a small comment in the code if you prefer, but please state that up front. Usually when people provide code samples for us to fix and issue or provide an enhancement, it is to satisfy their own needs.
Considering this issue has been around for how long?... if it were not for my sample code, this PR would not have been made.. and even more so, the sample code is/was just copy and pasted into someone else's PR. I don't care about the code being copy and pasted since I more than expect that to happen (although it did have a copyright header!), and am glad this is getting merged into NUnit proper... But a little recognition for my effort and or/contribution would be nice.
for example, i even attributed the code for 'Timeout After' to where I found it: https://github.com/abbotware/nunit-timeout/blob/master/Abbotware.Interop.NUnit/TaskExtensions.cs
@abbotware Again, no PR has been submitted. I can only guess, but it seems unlikely that anyone other than you will submit a PR using your code, given your comments.
That said, you did release your code on GitHub under the MIT license, which allows its use in other works without attribution.
NUnit itself is released under the same license. People take parts of it, even large parts, and incorporate it into other works all the time. It would be silly of us to complain since that's the license we chose.
I'm venting my own personal views here, BTW. They have no connection with the NUnit project's views.
@CharliePoole - Sorry thought the 'fixes #1638' was a PR request...
Any how - I just thought there was some form of minor recognition (although not required from the license) for helping resolve this old issue given that the method I proposed looked like it was being submitted.
It seems like this is just becoming a bigger deal than it needs to be: Please, just forget I asked. I'll submit a full PR next time for 'proper' recognition.
@abbotware No, that's just a fix to the user's own fork of nunit. It's noted in this issue because he referenced it in a comment.
Your point about attribution is well taken. It's just that you seemed to be addressing it to the NUnit Team, which hasn't actually used the code and isn't even considering it right now, since no PR has been submitted. Technically, we could go out and take the code from either repo and make use of it, but that's not something we normally do.
Curiosity question: did you consider submitting it yourself? You have been very active in this discussion and that would have seemed like a logical outcome.
It seems that TimeoutAttribute is not available in the portable version of NUnit 3 when targeting .NET core with a netcoreapp test. This is extremely useful for any sort of test involving threading or integration test, so it would be really useful if this was made available.
Using: NUnit 3.4.0, dotnet-test-nunit 3.4.0-beta1.