xamarin / xamarin-macios

.NET for iOS, Mac Catalyst, macOS, and tvOS provide open-source bindings of the Apple SDKs for use with .NET managed languages such as C#
Other
2.47k stars 514 forks source link

Remove workaround for ThreadPool not working #7080

Closed migueldeicaza closed 1 year ago

migueldeicaza commented 5 years ago

Xamarin on iOS currently has a workaround for the Mono Thread Pool not working:

https://github.com/xamarin/xamarin-macios/pull/5463

That workaround is necessary because the Mono's ThreadPool stops working after the app is suspended for 10 seconds:

https://xamarin.github.io/bugzilla-archives/58/58633/bug.html#c7

The workaround effectively kills all background support from Xamarin.iOS.

There is a repro that shows this problem in action (but might require disabling the incorrect bandaid that was put in place):

https://github.com/xamarin/xamarin-macios/issues/5333

steveisok commented 5 years ago

I undid the band-aid and built against mono-2019-06, using the sample provided by https://github.com/xamarin/xamarin-macios/issues/5333. After ~15-20 different tries, I couldn't reproduce the problem. Each time after 10+ seconds, the threadpool threads would kick back up and resume normal execution (Http requests too).

Has anyone else been able to reproduce? If so, what device(s), versions, etc?

marek-safar commented 5 years ago

@steveisok did you put your app into background mode on the device?

The workaround effectively kills all background support from Xamarin.iOS.

@migueldeicaza could you be more specific on this^ ?

steveisok commented 5 years ago

@marek-safar Yeah, I'd swipe up and wait at least 10 seconds (often longer). In addition, I opened other apps, did something, and then went back.

migueldeicaza commented 5 years ago

Yes, this means that the NSUrlSession which is an API that can perform downloads in the background, even if the application is stopped does not work. So the hack cancels the downloads.

@steveisok perhaps the heuristics for the OS have changed, and the app is not backgrounded, might be worth checking.

@mandel-macaque can provide more details on the side effects.

lambdageek commented 5 years ago

There's a similar issue on Android on at least one device (after a while, a background service can't schedule new Tasks to run). I've been looking at this piece of code: https://github.com/mono/mono/blob/d371432a1030db86d1ed6afb9bc174fb1587191f/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs#L649-L654 My theory is that when the app is in the background we have requests backing up somehow and we end up with insufficient number of calls to ThreadPool.RequestWorkerThread - that icall ends up here: https://github.com/mono/mono/blob/da02f1330f1dd872f1e723a1f6e35997979df27c/mono/metadata/threadpool-worker-default.c#L340-L349 and the call to work_item_push is critical because that's the code that signals a worker to unpark and to try and process a job from its queue. If we somehow end up with numOutstandingThreadRequests >= ThreadPoolGlobals.processorCount new jobs wouldn't run.

This EnsureThreadRequested/MarkThreadRequestSatisfied code is pretty critical - CoreCLR had to revert some changes (so they're back to matching referencesource) when they tried to optimize it: https://github.com/dotnet/coreclr/commit/b8dda0cbf7eae770fc685378ad7c542e2468a209

The android device where I saw this has processorCount = 2. I'm trying to repro on desktop by hardcoding 2 for the count in a hacked up corlib.

lambdageek commented 5 years ago

To followup on my theory that ThreadPool.RequestWorkerThread isn't getting called enough. I changed this loop to while (true) instead https://github.com/mono/mono/blob/b45d0a108474bafc56d853b74d89ac82800a38f8/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs#L648-L658

and also added some logging to note whether we ever call it when count >= ThreadPoolGlobals.processorCount and:

  1. before the app is backgrounded, count < ThreadPoolGlobals.processorCount
  2. after the app is backgrounded: a. I get my warning that count >= ThreadPoolGlobals.processorCount b. the task starts running.

So on the particular Android device that I'm watching, it certainly seems like we get into a situation where the native threadpool machinery was flooded with requests (count >= processorCount) but it s workers didn't unpark or start up to process them, and so numOutstandingThreadRequests never got decremented and as a result queueing new Tasks doesn't actually cause the threadpool to wake up and pick up work anymore.

I'm planning to look into a couple of things:

  1. What the heck is going in the unmanaged threadpool code
  2. What exactly is count < processorCount protecting against?
  3. Does CoreCLR do something different with ThreadPool.RequestWorkerThread? In Mono it seems pretty critical to call this periodically to nudge the unmanaged code to actually try to pick up work items. Maybe it doesn't do that in CoreCLR and the threadpool doesn't idle the same way. (Or maybe they just didn't consider the OS suspending the application since desktop OSes probably don't do that the same way).

But maybe we should check in something like:

#if MONO && MOBILE
    while (true) {
#else
   while (count < ThreadPoolGlobals.processorCount) {
#endif

And just see what happens?

lambdageek commented 4 years ago

Just a brief update. My previous comment is not the direction we ended up taking for the possibly-related Android issue.

The fix we ultimately ended up going with is mono/mono#17927. Once backports of that PR are available for xamarin.ios, we could investigate removing the workaround in XI.

Although, given that we haven't been able to reproduce the issue recently, we will first need to identify a way to induce the bad threadpool behavior on an ios device.

chamons commented 4 years ago

@lambdageek - Git suggests that mono 2020-02 has this (via b6a351c3308). Could we look at removing the work around now?

akoeplinger commented 4 years ago

@chamons the PR also landed in 2019-10 and 2019-12 so you should be able to already try it on your stable bits.

chamons commented 4 years ago

The Xamarin iOS and Mac team would like some customer testing with the this fix.

Here are the instructions:

  1. On latest stable (or preview) Xamarin.iOS / Mac modify your HttpClient creation to the following:
var handler = new NSUrlSessionHandler () { BypassBackgroundSessionCheck = true };
var client = new HttpClient (handler);
  1. Test your networking, in particular by back-grounding and for-grounding your application in various network conditions (wi-fi, cellular, etc).
  2. If you run into issues, change BypassBackgroundSessionCheck back to false (the default) and try to reproduce.
  3. Report any differences here in a comment. In theory, they should behave exactly the same now.

We'd like to change the default in future versions (https://github.com/xamarin/xamarin-macios/pull/8296) and hearing from some customers would be very helpful.

rolfbjarne commented 2 years ago

Looks like we can remove the workaround, since we haven't gotten any feedback after turning it off by default.

Scheduling it for .NET 8.