Recovery after network failure

mdaley commented 8 years ago

Hello,

From my reading of the eredis documentation, it seems to me that if the connection to redis fails, eredis itself should not fail and, when the connection recovers, requests should work again. Is that correct or have I misread what is possible (highly likely as I am very new to erlang)?

So, my example is that if I compile eredis and point it at a redis server, I can successfully do commands, like this:

rebar compile
erl -pa /ebin
{ok,C} = eredis:start_link("my-redis-location", 6379, 0, "", 5000).
{ok,<<"PONG">>} = eredis:q(C, ["PING"]).

And, when I physically disconnect my network cable, I get an error (which I would expect):

eredis:q(C, ["PING"]).
** exception exit: {timeout,{gen_server,call,
                                        [<0.35.0>,
                                         {request,[[<<"*">>,"1",<<"\r\n">>],
                                                   [[<<"$">>,"4",<<"\r\n">>,<<"PING">>,<<"\r\n">>]]]},
                                         5000]}}
     in function  gen_server:call/3 (gen_server.erl, line 212)

When I reconnect my network cable (and wait for the connection to start working again), I was hoping that requests would start working again but, instead, I only get errors:

eredis:q(C, ["PING"]).
** exception exit: {noproc,{gen_server,call,
                                       [<0.35.0>,
                                        {request,[[<<"*">>,"1",<<"\r\n">>],
                                                  [[<<"$">>,"4",<<"\r\n">>,<<"PING">>,<<"\r\n">>]]]},
                                        5000]}}
     in function  gen_server:call/3 (gen_server.erl, line 212)

The difference between the second error and the first is that the second mentions noproc instead of timeout. I assume that the eredis process has died.

Is it possible to make it such that the process doesn't die and it starts working again once the connection is working again?

Of course, I am, probably, misunderstanding how processes should work in the erlang world!

csbzy commented 8 years ago

you should know that the eredis_client start by gen_server:start_link/4. the first error causes the erlang shell process die and the erlang shell process is link with the eredis_client process(you start it in erlang shell),so it cause the eredis_client process die too.Thus when you ping again, the second error come.

you can read the gen_server:start_link/3 document and the erlang link Mechanism

And ,the eredis do has reconnect Mechanism

my english is poor,i hope you can know what i mean.

mdaley commented 8 years ago

Thanks for the reply. We think we have a way of dealing with the problem.

wooga / eredis

Recovery after network failure #84