romange / helio

A modern framework for backend development based on io_uring Linux interface
Apache License 2.0
435 stars 49 forks source link

chore: Fix the crash in tls shutdown #246

Closed romange closed 5 months ago

romange commented 5 months ago

The root cause was due to a socket being closed and destroyed inside ListenerInterface::RunSingleConnection and in parallel the listener was performing the shutdown sequence inside ListenerInterface::RunAcceptLoop().

In fact we had a comment saying it's working because socket.Shutdown did not preempt. In one of my recent PRs I implemented the logic where TlsSocket informs the peer about the shutdown which made the call preemptible. As a result the whole management code around this became unsafe.

  1. I introduced an intrusive refcnt so that a connection would not be destroyed inside RunSingleConnection when RunAcceptLoop still uses it.
  2. Changed the iteration loop in the latter tu support preemption.
  3. Fixed the Shutdown code to support the case where the tcp socket is closed in the middle of shutdown.
  4. Found unrelated migration bug that where we failed to Unlink a connection because the listener was destroyed. Now we try to unlink only if the connection indeed is linked inside a connection list.
romange commented 5 months ago

This fix will warrant another patch

romange commented 5 months ago

I can explain the flow via VC if you want

On Tue, Apr 9, 2024, 10:18 PM Shahar Mike @.***> wrote:

@.**** approved this pull request.

🤓

— Reply to this email directly, view it on GitHub https://github.com/romange/helio/pull/246#pullrequestreview-1989985818, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4BFCHKC3PS62RNDOIDNWLY4Q5IPAVCNFSM6AAAAABF5LIIVOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSOBZHE4DKOBRHA . You are receiving this because you authored the thread.Message ID: @.***>

chakaz commented 5 months ago

No need, it makes sense, I approved this PR :)