synopse / mORMot2

OpenSource RESTful ORM/SOA/MVC Framework for Delphi and FreePascal
https://synopse.info
Other
531 stars 132 forks source link

Race condition in THttpAsyncServer raw.pas #321

Open myonlylonely opened 4 days ago

myonlylonely commented 4 days ago

I have some problems with the techempower-bench example. I think there is a race condition between the DoRequest thread and the thread which execute the callback function. Please check if I am right or wrong.

function TRawAsyncServer.asyncdb(ctxt: THttpServerRequest): cardinal;
begin
  with fDbPool.Async.PrepareLocked(WORLD_READ_SQL, {res=}true, ASYNC_OPT) do
  try
    Bind(1, ComputeRandomWorld(Lecuyer));
    ExecuteAsync(ctxt, OnAsyncDb);
  finally
    UnLock;
  end;
  //a task is pushed to queue, which could be executed in another thread,
  //but here the task could be finshed, before 777 status code is set,
  //even if 777 status code is set before the task creation code,
  //there still could be a race condition, because after DoRequest function,
  //there still some code change the values in the ctxt object, 
  //which could cause a race condition with the callback function OnAsyncDb
  result := ctxt.SetAsyncResponse;
end;
synopse commented 4 days ago

I guess there is a WaitLock() within THttpAsyncServerConnection.AsyncResponse which would ensure that the main DoRequest thread is finished.

But perhaps the flags may indeed not be properly set. Try with my next commit.

myonlylonely commented 4 days ago

Thanks for quick reply. My question is how to write proper callback function? For example:

procedure TRawAsyncServer.OnAsyncDb(Statement: TSqlDBPostgresAsyncStatement;
  Context: TObject);
var
  ctxt: THttpServerRequest absolute Context;
begin
  if (Statement = nil) or
     not Statement.Step then
    ctxt.ErrorMessage := 'asyncdb failed'
  else
    ctxt.SetOutJson('{"id":%,"randomNumber":%}',
      [Statement.ColumnInt(0), Statement.ColumnInt(1)]);
  ctxt.OnAsyncResponse(ctxt);
end;
  1. OnAsyncResponse do have a lock to ensure it execute after DoRequest, but before I execute OnAsyncResponse, I would like to change the output of the request, like the example above. Is the ErrorMessage, SetOutJson could also cause race conditions?
  2. The lock has a timeout, could that cause the callback function never be triggered if the task finshed too quick but DoRequest lock release too late?
synopse commented 4 days ago
  1. Concurrent access to the THttpServerRequest could indeed be a problem. But in practice, result fields (ErrorMessage, or SetOutJson) are perfectly fine. Only changing the flags or the status code may be an issue. This is the reason why there is a RespStatus optional parameter to the OnAsyncResponse event handler. And I suspect we don't need to change the flags.

  2. The lock has a timeout only waiting for the current main connection thread to set its state. So the delay should be almost minimal, and certainly always < 1ms. So this part should be safe.

Anyway, the race condition is very unlikely to happen, because the main connection thread is just executing a few lines of non-blocking code with no syscall, whereas the async call is much complex, involving the network layer and a lot of syscalls. It may happen only on a really overloaded system, with numerous threads and all cores at 100%. In which case, the problems will appear elsewhere than in our web server part. Unless you did actually see such a race condition appear on production or during testing?

myonlylonely commented 4 days ago

I'm afraid it did happen during testing. These threads are executed on different cores, and it is not the only program running on that core. So it did happen, but like you said, very low chance. Is that possible we could add another callback that executed after calling OnAsyncResponse, make sure the user defined callback function it self is also protected by the locks?

synopse commented 4 days ago

Please try with my latest commits:

myonlylonely commented 4 days ago

If the task finshed too early, it could still cause problem. The client never get the response.

function TRawAsyncServer.async(ctxt: THttpServerRequest): cardinal;
begin
  ExecuteAsync(ctxt, OnAsync);
  result := ctxt.SetAsyncResponse;
  //If the Task finished here, after SetAsyncResponse but before the ResponseFlag is set, the response will never sent to the client, and the connection is kept waiting until client timeout because of the flag check.
end;
synopse commented 2 days ago

I don't think so. The TryLock should wait for ResponseFlag to be set.

myonlylonely commented 2 days ago

Oh, you are right. I'm looking at the old version, which is LTS 2.3. The flag check is moved to the lock protection in the mater branch.

myonlylonely commented 2 days ago

I have another question. If client has a timeout setting, which disconnect before the data is sent, the connection object would be freed. The callback function then called with the ctxt pointer which owns some garbage. How should I write safe code in something like this?

synopse commented 1 day ago

This should be handled by

  if IsDangling or
     (Sender <> fRequest) or
     (fClosed in fFlags) then
    exit;

Perhaps https://github.com/synopse/mORMot2/commit/1a03df57 would help a little more, if the request pointer is reused between connections. At least ConnectionID should be genuine per connection. Edit: no, this won't harden anything because both pointers are equal.

myonlylonely commented 1 day ago

How could IsDangling function really check for dangling pointers? It is a raw pointer just like in C++, the memory could be resued for other objects, even read only access TPollAsyncConnection.fHandle variable itself could possibly cause an access violation in that case. Or even worse, the memory was not reused yet and kept that value, the object is freed, and still could access because the connection id is the same.

synopse commented 1 day ago

The connection ID won't be the same with a new connection. There is a GC of closed connections, so the pointers are not freed instantly, but reused for another connection. In which case, I have added a fAsyncConnectionID32 field to ensure this is the actual request, not a recycled connection. And even if it has been released and reused for anything else after 2 seconds of GC, the recycled pointer will also be detected.

myonlylonely commented 1 day ago

If the async task process takes 10 seconds(just for example), and the request object is freed. The pointer to the request object which I saved earlier could point to a memory that could already been allocated for other code, not even by mORMot. In this case, the code access could cause access violation, could pass the check or not, anything could happen. Actually, there is no way to actually detect a dangling pointer with raw pointer. And even there is a way to detect danging pointer, I have to write logic(such as setOutJson) before OnAsyncResponse, that skipped your check(which is wrong any way). Actually, if you use any 3rd memory manager under debug mode (for example, FastMM under FullDebugMode), it will throw access violation for sure.

myonlylonely commented 1 day ago

The following example, Sleep simulate a task that took a long time. and before the task is finshed, the client could disconnect the connection, causing connection recycled, after 2 seconds, the object is freed. Then the callback has a dangling pointer ctxt. If you use FastMM5 and enable FullDebugMode, OnAsyncResponse will throw an access violation for sure. Because FastMM force the recycled memory to be written some garbage flag, cauing the acess violation to be triggered. In real cases, these memory could be allocated by other code or not immediately reused, so the crash is more likely to be appeared random.

function TServer.asyncdb(ctxt: THttpServerRequest): cardinal;
var
  aTask: ITask;
begin
  aTask := TTask.Create(
    procedure
    begin
      Sleep(10000);
      G_Server.OnAsyncDb(ctxt);
    end);
  aTask.Start;
  Result := ctxt.SetAsyncResponse;
end;

procedure TServer.OnAsyncDb(Context: TObject);
var
  ctxt: THttpServerRequest absolute Context;
begin
  ctxt.OnAsyncResponse(ctxt);
end;
synopse commented 1 day ago

The new code would detect such issue, and avoid any GPF. But you are right: the main point is that you could access ctxt out of scope for sure in end-user code, before mORMot code itself. So the GPF would occur before.

myonlylonely commented 1 day ago

Actually,if you enabled FastMM5 with FullDebugMode,you will see the access violation also happened inside AsyncResponse with no user code before that. Again, there is no way to detect dangling pointer with raw pointer. And please do some research about dangling pointers, there is no way to do proper detect that using raw pointers since the memory released could be reused for literally anything.

myonlylonely commented 6 hours ago

Perhaps a real demo would help. I have created a THttpAsyncServer in the demo, listen to 8080 port. Click the "Test Req" button on the form will make a request to the server, and has a timeout 1 second. The request will Sleep 8 seconds, then call the OnAsyncResponse function. It has no user code in the callback, just calling OnAsyncResponse itself. And I enabled FastMM5 under FullDebugMode. There will be a access violation inside OnAsyncResponse every time. I have uploaded the full test project, including all it's dependencies. Just Open the project in Delphi, and compile it. mORMot2 and FastMM5 are pulled from the latest commit in master branch. HTTPServerTest.zip