There's an MT error in the IOCP & libevent event loops in stdlib, when resolving select timeouts.
IOCP checks for event.timeout? and fiber.timeout_select_action together, and enqueues the fiber otherwise, while libevent only checks for fiber.timeout_select_action and enqueues otherwise.
Both don't enqueue the fiber when timeout_select_action is set but fail an atomic CAS.
Failing scenario
thread A starts cancelling the timeout
thread B dequeues the timeout event
thread A sets timeout_select_action to nil
thread B checks timeout_select_action (nil)
thread A tries to dequeue timeout event (noop: already dequeued)
thread A enqueues fiber
thread B enqueues fiber :boom:
I believe this is only working in the current crystal releases because a fiber is tied to a thread and each thread has its own event loop, same if the channel sender/receiver fiber is resumed: they will be executed concurrently, not in parallel, so they can't fail.
But break any of these two assumptions (as we do in ExecutionContext) and :boom:
There's an MT error in the IOCP & libevent event loops in stdlib, when resolving
select
timeouts.IOCP checks for
event.timeout?
andfiber.timeout_select_action
together, and enqueues the fiber otherwise, while libevent only checks forfiber.timeout_select_action
and enqueues otherwise.Both don't enqueue the fiber when
timeout_select_action
is set but fail an atomic CAS.Failing scenario
I believe this is only working in the current crystal releases because a fiber is tied to a thread and each thread has its own event loop, same if the channel sender/receiver fiber is resumed: they will be executed concurrently, not in parallel, so they can't fail.
But break any of these two assumptions (as we do in ExecutionContext) and :boom: