Closed migueldeicaza closed 1 year ago
I undid the band-aid and built against mono-2019-06, using the sample provided by https://github.com/xamarin/xamarin-macios/issues/5333. After ~15-20 different tries, I couldn't reproduce the problem. Each time after 10+ seconds, the threadpool threads would kick back up and resume normal execution (Http requests too).
Has anyone else been able to reproduce? If so, what device(s), versions, etc?
@steveisok did you put your app into background mode on the device?
The workaround effectively kills all background support from Xamarin.iOS.
@migueldeicaza could you be more specific on this^ ?
@marek-safar Yeah, I'd swipe up and wait at least 10 seconds (often longer). In addition, I opened other apps, did something, and then went back.
Yes, this means that the NSUrlSession which is an API that can perform downloads in the background, even if the application is stopped does not work. So the hack cancels the downloads.
@steveisok perhaps the heuristics for the OS have changed, and the app is not backgrounded, might be worth checking.
@mandel-macaque can provide more details on the side effects.
There's a similar issue on Android on at least one device (after a while, a background service can't schedule new Tasks to run). I've been looking at this piece of code:
https://github.com/mono/mono/blob/d371432a1030db86d1ed6afb9bc174fb1587191f/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs#L649-L654
My theory is that when the app is in the background we have requests backing up somehow and we end up with insufficient number of calls to ThreadPool.RequestWorkerThread
- that icall ends up here:
https://github.com/mono/mono/blob/da02f1330f1dd872f1e723a1f6e35997979df27c/mono/metadata/threadpool-worker-default.c#L340-L349
and the call to work_item_push
is critical because that's the code that signals a worker to unpark and to try and process a job from its queue.
If we somehow end up with numOutstandingThreadRequests >= ThreadPoolGlobals.processorCount
new jobs wouldn't run.
This EnsureThreadRequested
/MarkThreadRequestSatisfied
code is pretty critical - CoreCLR had to revert some changes (so they're back to matching referencesource) when they tried to optimize it: https://github.com/dotnet/coreclr/commit/b8dda0cbf7eae770fc685378ad7c542e2468a209
The android device where I saw this has processorCount = 2
. I'm trying to repro on desktop by hardcoding 2
for the count in a hacked up corlib.
To followup on my theory that ThreadPool.RequestWorkerThread
isn't getting called enough.
I changed this loop to while (true)
instead https://github.com/mono/mono/blob/b45d0a108474bafc56d853b74d89ac82800a38f8/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs#L648-L658
and also added some logging to note whether we ever call it when count >= ThreadPoolGlobals.processorCount
and:
count < ThreadPoolGlobals.processorCount
count >= ThreadPoolGlobals.processorCount
b. the task starts running.So on the particular Android device that I'm watching, it certainly seems like we get into a situation where the native threadpool machinery was flooded with requests (count >= processorCount) but it s workers didn't unpark or start up to process them, and so numOutstandingThreadRequests
never got decremented and as a result queueing new Tasks doesn't actually cause the threadpool to wake up and pick up work anymore.
I'm planning to look into a couple of things:
count < processorCount
protecting against?ThreadPool.RequestWorkerThread
? In Mono it seems pretty critical to call this periodically to nudge the unmanaged code to actually try to pick up work items. Maybe it doesn't do that in CoreCLR and the threadpool doesn't idle the same way. (Or maybe they just didn't consider the OS suspending the application since desktop OSes probably don't do that the same way).But maybe we should check in something like:
#if MONO && MOBILE
while (true) {
#else
while (count < ThreadPoolGlobals.processorCount) {
#endif
And just see what happens?
Just a brief update. My previous comment is not the direction we ended up taking for the possibly-related Android issue.
The fix we ultimately ended up going with is mono/mono#17927. Once backports of that PR are available for xamarin.ios, we could investigate removing the workaround in XI.
Although, given that we haven't been able to reproduce the issue recently, we will first need to identify a way to induce the bad threadpool behavior on an ios device.
@lambdageek - Git suggests that mono 2020-02 has this (via b6a351c3308). Could we look at removing the work around now?
@chamons the PR also landed in 2019-10 and 2019-12 so you should be able to already try it on your stable bits.
The Xamarin iOS and Mac team would like some customer testing with the this fix.
Here are the instructions:
var handler = new NSUrlSessionHandler () { BypassBackgroundSessionCheck = true };
var client = new HttpClient (handler);
We'd like to change the default in future versions (https://github.com/xamarin/xamarin-macios/pull/8296) and hearing from some customers would be very helpful.
Looks like we can remove the workaround, since we haven't gotten any feedback after turning it off by default.
Scheduling it for .NET 8.
Xamarin on iOS currently has a workaround for the Mono Thread Pool not working:
https://github.com/xamarin/xamarin-macios/pull/5463
That workaround is necessary because the Mono's ThreadPool stops working after the app is suspended for 10 seconds:
https://xamarin.github.io/bugzilla-archives/58/58633/bug.html#c7
The workaround effectively kills all background support from Xamarin.iOS.
There is a repro that shows this problem in action (but might require disabling the incorrect bandaid that was put in place):
https://github.com/xamarin/xamarin-macios/issues/5333