yrashk / socket.io-erlang

Socket.IO server for Erlang
http://socket.io/
315 stars 73 forks source link

Requests getting lost when handle_request receives non-selectively #42

Open ferd opened 13 years ago

ferd commented 13 years ago

TL;DR: When we're using 'handle_request/5' to deal with a request, the control of the socketio_http process is given up to the user and queries can get lost when doing a non-selective receive during quick successions of client requests


I used the handle_request/5 call and had it communicating with non-socket.io processes. As part of its task, the callback had to do a non-selective receive and digest everything it received in a while to handle them later (much like a gen_server would do it).

However, socketio_http runs the 'handle_request' function calls within its own process, so whenever socketio_http_misultin or some other instance calls the socketio_http server for the current connection more than once (can be simulated by reloading many tabs of the same browser really fast), the 'handle_request' currently running is the one getting the {request, 'GET', ...} message bypassing the socketio_http process altogether. The process model used in this case should be modified to avoid such issues.

In the meantime (and for external readers), I solved the issue by basically spawning a new process in handle_request, then monitoring it and waiting only for the monitor of that process to be done. During that time the other process can send the data to the socket as much as it wants, but we're forcing keep-alive/pipelined calls to be serialized when they're in the same section. This exact model should probably be moved from my own application to socket.io-erlang itself (to be discussed).

ferd commented 13 years ago

One question I meant to ask is that if we go with my solution for the whole thing, do we accept to duplicate the 'Req' value with all of its contents? I know misultin already does sandboxing that way -- we'd end up with 3 copies of the socket by acting that way, but I see no other good fix that doesn't involve doing things that way. Is it worth it for the corner cases where people do it my way and punish everyone, or punish the few people who have that issue by needing them to look at documentation (which we will need to write at some point).

sinnus commented 12 years ago

@ferd, could you copy paste solution? Thank you!

ferd commented 12 years ago

Here you go, @sinnus. The code is a bit customized for some closed-source app I had, and you can disregard some options such as SessionMods and SessionMethod. They were application-specific things.

%% How to start the connection and whatnot
start_link(Options) ->
    SessionMods = proplists:get_value(sessions, Options, []),
    SessionMethod = proplists:get_value(session_method, Options, [{http_cookie, "sessionid"}]),
    Port = proplists:get_value(port, Options),
    {ok, Pid} = socketio_listener:start([{http_port, Port},
            {default_http_handler, {?MODULE, [SessionMods, SessionMethod]}}]),
    EventMgr = socketio_listener:event_manager(Pid),
    ok = gen_event:add_handler(EventMgr, ?MODULE, [SessionMods, SessionMethod]).

%% Default HTTP Handler
 handle_request(_Method, _Path, Req, SessionMods, SessionMethod) ->
    %% Call some function to handle this HTTP call!
    %% This depends on messy knowledge abouy misultin sadly.
    %% This spawning business is necessary due to the following
    %% socket.io issue: https://github.com/yrashk/socket.io-erlang/issues/42
    Pid = spawn(fun() -> (my_module:loop(SessionMods, SessionMethod))(Req) end),
    Ref = erlang:monitor(process,Pid),
    receive
        {'DOWN', Ref, process, _, _} -> ok
    end.

You can see the trick is starting at Pid = spawn(....), and finishes with the receive block. Let me know if you need more explanations.

sinnus commented 12 years ago

@ferd, I can't understand how messages can get lost?

ferd commented 12 years ago

New socket calls are received as messages by the TCP socket controlling process. Misultin dispatches a process to do that, which socket.io's code sits in to redirect the messages to the right spot.

So if the handle_request/N function is a long running one (could take a few seconds), sits in the same process as socket.io (because it's a callback) and thus in the misultin process, and uses selective receives, then when more data is coming form the socket (say, pipelined calls) or whatever, the long-running function will be consuming these messages instead of leaving them to the socket.io process that usually deals with them.

These messages were likely meant for the socketio process in charge of handling queries, but might have been consumed or discarded by your handler. That means that on the other end of the request (the browser), the query will never be replied to and it will time out.

This can also be true of messages sent to the socketio process in charge of handling queries, unrelated to sockets (internal communications). Because of this, I had to split my queries into two processes. One one hand, the handle_request function will be guaranteed not to consume messages it wasn't meant to, and by spawning a new process to handle the actual request, I allow myself to do and receive whatever I want in my_module:loop/N.

sinnus commented 12 years ago

@ferd, is it Misultin issue to bypass messages if a function takes a lot of time?

ferd commented 12 years ago

I don't think we could pinpoint the problem to a single place. The problem comes from the interaction of many logical actors (Misultin's side, socket.io-erlang's side, the callback's side) running with the same mailbox. You can fix the issue on any side. I don't know of a way to do it that will not require adding processes, though.

sinnus commented 12 years ago

@ferd, thank you for your answers! As I can understand the problem is that mailbox is "shared" between other sides,

ferd commented 12 years ago

Exactly. Then ownership of messages for each query or worker becomes extremely vague. Misultin doesn't read messages, but you can have conflicts between your callbacks and socket.io-erlang. Using processes the way I did it solves the problem.

sinnus commented 12 years ago

Hello, @ferd! I tested "long running" callback for longpolling and websocket using timer:sleep:

-module(socketio_transport_websocket).

    ...
%% Websockets
handle_call({websocket, Data, _Ws}, _From, #state{ heartbeat_interval = Interval, event_manager = EventManager } = State) ->
    error_logger:info_msg("Before timer~n", []),
    timer:sleep(1000),
    error_logger:info_msg("After timer~n", []),
    ...

-module(socketio_transport_polling).

    %% Incoming data
handle_call({_TransportType, data, Req}, From, #state{ server_module = ServerModule,
                                                       event_manager = EventManager,
                                                       polling_duration = Interval,
                                                       sup = Sup } = State) ->
    error_logger:info_msg("Before timer~n", []),
    timer:sleep(1000),
    error_logger:info_msg("After timer~n", []),
    ...

I used flash socket.io client to send data to server and all messages was processed by server side. I only faced with longpolling timeout when sleep:timer > longpolling timeout. Can you help me to write test to reproduce issue?

ferd commented 12 years ago

I don't sure I understand? You're trying to reproduce the issue I had?

sinnus commented 12 years ago

Yes, I'm trying to reproduce the issue without "reloading many tabs of the same browser really fast".

sinnus commented 12 years ago

@ferd, I have reproduced your issue. My test code above was in wrong place because of misunderstanding, sorry.

My steps to reproduce:

  1. Open index.html
  2. Send data from flash client
  3. No echo on client side
  4. After 10 seconds (timeout) echo will be received
handle_request('GET', [], Req) ->
    error_logger:info_msg("Before timer", []),
    timer:sleep(10000),
    error_logger:info_msg("after timer", []),
    Req:file(filename:join([filename:dirname(code:which(?MODULE)), "index.html"]));
sinnus commented 12 years ago

As we can see handle_request('GET', [], Req) blocks socket.io server. All clients will be wait handle_request function.