Open msironi opened 13 years ago
Even my suggested solution isn't perfect, as if the pop or decrement throw exceptions they would hide the real exception. A better solution would be:
// Keep track of the number of pending type requests
_counter.Increment(concreteType);
try
{
// Keep track of the sequence
// of requests on the stack
lock (_requests)
{
_requests.Push(concreteType);
}
try
{
// Boring method details ommitted.
}
finally
{
try
{
lock (_requests)
{
_requests.Pop();
}
}
catch { }
}
}
finally
{
try
{
_counter.Decrement(concreteType);
}
catch { }
}
Interesting. Do you have any test code that could reproduce the error so that we can throw it in a unit test and make sure that it doesn't happen again?
(Oops. Pressed the wrong button. Reopening the issue)
No I don't. I'm hesitant to blame LinFu for the actual exception we saw as I have no solid proof, only semi-wild guesses from staring at a bit of the code. The code was running somewhere between 3-4 weeks 24/7 before it started throwing this exception, and the exception was thrown only for this one type. But once is started it was then thrown on every request, we had to restart the process. And it's been fine (as far as I know anyway) since then.
The one thing that is different about this type (and that is why I put it in the issue description) is that it has a factory for an unnamed T service that then delegates actual construction to named implementations of T. Depending on how LinFu is written this might pass through AutoCreateInternal twice (once for the unnamed factory, once for the concrete class that has the named [Implements] on it). If that were true then our pattern of unnamed factory invoking named concretes would effectively cut the counter in half. Both classes that actually have the [Implements] on them only have default constuctors so aside from this one level of "recursion" it should never go any deeper.
But ignoring all of the above, if the counter gets to be 10 somehow, bumping it to 11 then throwing the RecursiveDependencyException w/o decrementing means that (at best) you're going to leak a counter value, and if that happens 10 times for whatever reason you'd never be able to create an instance of the type ever even in cases where you didn't have recursion issues (say the type has multiple constructors and only some of them exceed the depth restriction). Ignoring the exception we saw this seems like a bug.
Since this seems to be an issue that's hard to reproduce, try applying the patch to your local copy of LinFu's source code and run it for a few weeks. If it solves the problem and it doesn't break any tests, then I'l merge the changes back into the official branch.
Fair enough.
What scares me about this is that the IOC service that failed has been in production for ~6 months and we've never seen this before. What has changed that is relevant is that we're now calling it orders of magnitude more times than we were before.
But I'll make a local change, get it out in our next release and let it bake for at least a month before I update the this issue.
p.s. Thanks for being so responsive on what is a pretty out-there problem. It's greatly appreciated.
No problem. It's definitely an interesting corner case. Keep me posted. :)
I've been distracted by a few fires at work and haven't had time to make any changes to this code but this has bit us again with a totally different service. I've written a unit test (mstest but converting to nunit would be easy enough) that reproduces the problem:
[Implements(typeof(Barfer), LifecycleType.OncePerRequest)]
private class Barfer
{
public Barfer(bool barf)
{
if (barf) throw new InvalidOperationException("BARF!");
}
}
[TestMethod]
public void BarfTest()
{
// These should all fail with InvalidOperationException, except LinFu
// turns this into TargetInvocationException.
for (int i = 0; i < 10; i++)
{
try
{
Services.GetService<Barfer>(null, true);
}
catch (TargetInvocationException ex)
{
Assert.IsInstanceOfType(ex.InnerException, typeof(InvalidOperationException));
Assert.AreEqual("BARF!", ex.InnerException.Message);
}
}
// These should work but don't because AutoCreateInternal is broken, a
// RecursiveDependencyException is thrown.
for (int i = 0; i < 10; i++)
Services.GetService<Barfer>(null, false);
}
The test creates 10 service instances that throw exceptions then expects to be able to create 10 more after that. But because the counter leaked 10 times the thread can now never make service instances of that type, ever.
I don't have a test for it, but the stack that is used for the RecursiveDependencyException also has a bug in that it's not thread local, so unless there is a singleton lock for it somewhere the stack you get as part of the RecursiveDependencyException will possibly have types from other service requests mixed into it.
A bigger worry for me is that the latest occurance of RecursiveDependencyException was thrown by a class that can't throw exceptions in its constructor, so I have zero explanation for what is going on. But at least I can confirm this part of the problem.
I went ahead and wrapped the recursive dependency counters within a try/finally block, and the counters are now protected with a lock statement. Try it again and let me know if it fixes the issue.
We recently saw LinFu throw a RecursiveDependencyException for one of our services in one of our test labs. What makes this unusual is that the service in question is only using default constructors so there shouldn't be any dependency issues. A little background:
Our service has a factory that creates one of two named service instances, depending on a registry value, like this:
It's being used to allow an outside tier to use a WCF proxy and for an internal tier to talk directly to the db, depending on the registry value. Two classes implement the interface using named instances as above.
The code ran as part of a windows service for about a month w/o restarting (in our production environment this isn't uncommon). Once we started getting the RecursiveDependencyException for this service every request returned this exception.
I took a peek in the code and I'm not sure if it's the only problem, but there are some issues in ContainerExtensions.AutoCreateInternal. Part of the method is below:
The first obvious problem is that the recursive detection counter gets incremented and if it exceeds the limit of 10 an exception gets thrown but it doesn't get decremented. This means that once you get into the state of throwing this exception the counter will never go down and you'll be in it forever. The second problem is that the code is unguarded so if any exceptions occur in code between the increment/decrement the decrement never happens and you 'leak' a counter. I haven't followed through all the code so I'm not sure this is possible but that could explain why we saw this problem after a month or so.
I think that the pop of _requests and the decrement of the counter need to be in a finally block to ensure they always occur (made even more complicated because pop/decrement need to be guarded from each other as well). I think the method needs to look like this: