Open ysbaddaden opened 1 year ago
Reading the description of GenServer, the timeout will indeed never be reached, because of the regular calls to the GC. We must remove the default timeout handling with GenServer, and implement our own solution.
Solution A:
The handle_info({:collect_garbage}, state)
callback can detect when there is nothing to do (no active/queued contacts) and record the time it happened, then on subsequent calls if there is still nothing to do, we can compare the recorded time and eventually terminate the process when the timeout has been reached.
But what if we got messages handled in-between calls to :collect_garbage
? Maybe that's not a problem. It's not particularly expensive to start/stop an Erlang process.
Solution B:
An alternative (maybe simpler) solution could be to save the time at which we last saved the state (in ChannelBrokerAgent.save_state/1
), and have State.process_timeout/1
return a new timeout based on this recorded time. Even if the :collect_garbage
message is sent every 2 minutes, we'd eventually reach a point where the timeout is below those 2 minutes and will happen before the :collect_garbage
message is sent.
This requires handle_info({:collect_garbage})
to not save the state when there are no active/queued contacts (to not influence the timeout).
@ysbaddaden Can you help evaluating the severity of this bug. What are the risks involved on the broker not timeouting?
Low.
Unless we have lots of channels on a single instance, which might happen if we start grouping instances. It consumes a bit of memory and a few calls every N minutes for each channel.
It shouldn't be hard to fix.
With the GC revamp in #2263 the channel-broker processes will receive a message every some minutes at worst, so
GenServer
will never send a:timeout
and the channel-broker process will never be discontinued.We should replace the timeout mecanism. There should be a regular call (e.g. every 30 minutes) to verify whether the channel broker has nothing left to do (empty queue and no active calls) and decide to terminate the process, after making sure to delete the state from the channel-broker-agent.