Closed alexbereznikov closed 3 months ago
@pictos As I described in related issue (#2446), breaking change was introduced after 5.0.17 - Task.Run (which uses default task scheduler) has been changed to Task.Factory.StartNew (that uses a current task scheduler by default). More precisely, breaking changes were made in this commit - https://github.com/mbdavid/LiteDB/commit/f21cd8405eda7c7b59b018c9a541b0c3605eb5da
So this PR aims to revert this breaking change, not to introduce one.
Regardless of whether our custom context is implemented correctly, I don't think it is a good idea to execute writer queue on a captured context (e g UI context). I would say, it rather highlights the issue.
So, looking more closely to the implementation I strongly believe that the issue is more than having it capturing the current task's TaskScheduler
.
The task
is being created with LongRunning
flag, and uses an async delegate
, and you may know that those two doesn't fit togheter.
So the first thing we should change here is to make the ExecuteQueue
a void
method, and block the created thread on the async operations or rethink all this engine, which I believe will not be done on this PR if we choose this path.
edit: Did a PoC using the main
implementation and it will not flow the UI context, adding the TaskScheduler or not will not change the behavior, will just go throw UI thread if you pass TaskScheduler.FromCurrentSynchronizationContext()
, so I would say that your PR will not fix the issue at the end
@pictos I saw this discussion (until I was removed from #moderators channel :)). Yes this could solve the issue as well, but what are real benefits of having a separate thread for this? I mean, this is not a like windows message loop thread, where we have to keep the same thread id. From my point of view, having a separate thread here have some drawbacks:
I agree that LongRunning probably is redundant here, we can just as well revert to Task.Run.
Regarding PoC - I couldn't reproduce the issue in a small PoC either, but I'm 100% sure that our integration tests started to fail because of this regression, and this fix fixed this for us (we have a fork with some fixes applied).
@alexbereznikov, let me answer your comments in pieces.
Yes this could solve the issue as well, but what are real benefits of having a separate thread for this? I mean, this is not a like windows message loop thread, where we have to keep the same thread id
First of all, we don't need to manage the dedicated thread, that BCL will do it for us, since we're using the Task.Factory.StartNew
. So, we don't need to manage the thread, we are handling with the Task
, which is a high-level abstraction.
Loosing ability to use async primitives - roughly half of the time the thread will do nothing while waiting for event
We don't need the async primitives in this specific case. We want to have a dedicated thread to handle the DiskWriter
work.
I agree that LongRunning probably is redundant here, we can just as well revert to Task.Run.
I didn't say the LongRunning
is redundant, I said that passing the TaskScheduler
as parameter is redundant. And we can't use Task.Run
here, because that way we will block a ThreadPool
thread to do a hard work, and that's wrong by design. The threads on the ThreadPool
aren't designed to do a heavy work like that. That said, the usage of the Task.Factory.StartNew
is a good approach here.
Regarding PoC - I couldn't reproduce the issue in a small PoC either, but I'm 100% sure that our integration tests started to fail because of this regression, and this fix fixed this for us (we have a fork with some fixes applied).
Friendly saying. And still there you're confident that the custom SyncronizationContext implemented on your tests doesn't have an issue?
@pictos Ok, we do use sync I/O operations here and thread pool is really not the best choice in this case.
But, what if we change WritePageToStream
to use WriteAsync
instead of Write
, and FlushToDisk
extension to use FlushAsync
instead of Flush
? In this case, we won't block thread pool thread to do a hard work - its responsibility will be pass pages from queue to _stream.WriteAsync
. Also, thread won't wait on WaitAsync
.
Regarding custom SyncronizationContext
- it works perfectly fine, it is not a "test" context, we do use it in the main app. The only issue it has is that it is disposable, and can perform only one operation at a time. So the issue we had initially was when continuation from ExecuteQueue
got posted to already disposed synchronization context.
@alexbereznikov, why are so against to use a dedicated thread for it? It's the right tool this job.
For this PR at this time we have plan to not change how it works in core, so I'll follow the Mauricio's recommendations. And feel free to open an issue proposing this change, if it's approved we can merge it in the next major release.
@pictos I just feel that it is a step in a backwards direction. I don't think that changing Write to WriteAsync can break anything.
Regarding custom sync context and PoC - please take a look at this - https://github.com/alexbereznikov/LiteDB/tree/2446-poc
Here I added a new console project called Poc, which uses a custom synchronization context and a task scheduler created from it. It outputs all the stack traces for each call to Send/Post, and it is clear that ExecuteQueue calls got redirected to this context.
Context itself does not flow, for sure
@alexbereznikov I've a feeling that you aren't listening to me and just want to make your point pass. What you did on your PoC just confirms what I said here, what in resume is "the only way that your custom SyncContext will flow to the custom Task is if you call TaskScheduler.FromSyncronizationContext()
. Passing TaskScheduler.Default or omitting the call will not touch on the SyncContext, no matter what's".
As peer discussion with @mbdavid, for now we will not change the approach. But feel free to open an issue where you can put more info and we can discuss another approach for this, but remember it must be ThreadSafe.
@pictos But in this case Task.Factory.StartNew uses a current task scheduler, which in turn uses that context. I don't think it is a correct behavior, because in this PoC I didn't change anything in LiteDB sources, i e haven't passed this TaskScheduler.FromSyncronizationContext() when starting ExecuteQueue. And passing TaskScheduler.Default fixes this issue.
I'm honestly surprised a lot that such a simple one-line regression fix spawned that much discussion. If you insist I'll do in the way you proposed because I have no other options :) But I'm still 100% sure that this is a step in a backwards direction - instead of fixing the issue and moving on we're rolling back to ways it was done a while ago.
@alexbereznikov, let's try one more time... Today that API is sensitive to the SyncContext, so can be devs that rely on that. Changing this behavior is a breaking change that we can't do in a SR.
So, for this PR, that will be released in a SR we need to fix the fact that right now the task is leaking to the ThreadPool. For changing what you want, we will need to do a new PR for that and also verify on the code place other places that are sensitive to SyncContexts and change that, hope that now you understand.
Thanks for your help and bringing this to light. I did the necessary changes for created #2505 to follow up the next changes. If you want to take a lead on that and change all places on our code base to be SyncContext agnostisc let us now and we will assign the issue to you.
This addresses #2446
Task.Factory.StartNew tries to use task scheduler from the current task, so TaskScheduler.Default needs to be passed here.