sumo_pgsql gen_server <0.905.0> terminated with reason: sock_closed

asyncmind0 commented 4 years ago

Connection to postgres timesout after a while of inactivity.

How do I handle this ?

01:31:35.869 [error] CRASH REPORT Process <0.897.0> with 10 neighbours exited with reason: sock_closed in gen_server:handle_common_reply/8 line 796
01:31:35.869 [error] Supervisor sumo_backend_sup had child sumo_backend_pgsql started with sumo_backend_pgsql:start_link(sumo_backend_pgsql, [{host,"127.0.0.1"},{port,6432},{database,"xxxxx"},{username,"xxxxx"},{password,"xxxx"}]) at <0.882.0> exit with reason sock_closed in context child_terminated
09:37:22.845 [error] gen_server 'wpool_pool-sumo_pgsql-1' terminated with reason: no case clause matching {noproc,{gen_server,call,[<0.889.0>,{command,epgsql_cmd_parse,{[],"SELECT * FROM \"app\" WHERE (\"app_id\" =  $1 ) LIMIT $2 OFFSET $3",[]}},infinity]}} in wpool_process:handle_call/3 line 189
09:37:22.845 [error] CRASH REPORT Process 'wpool_pool-sumo_pgsql-1' with 0 neighbours crashed with reason: no case clause matching {noproc,{gen_server,call,[<0.889.0>,{command,epgsql_cmd_parse,{[],"SELECT * FROM \"app\" WHERE (\"app_id\" =  $1 ) LIMIT $2 OFFSET $3",[]}},infinity]}} in wpool_process:handle_call/3 line 189
09:37:22.845 [error] Supervisor 'wpool_pool-sumo_pgsql-process-sup' had child 'wpool_pool-sumo_pgsql-1' started with wpool_process:start_link('wpool_pool-sumo_pgsql-1', sumo_store, [sumo_store_pgsql,[{storage_backend,sumo_backend_pgsql},{workers,10}]], [{queue_manager,'wpool_pool-sumo_pgsql-queue-manager'},{time_checker,'wpool_pool-sumo_pgsql-time-checker'},...]) at <0.888.0> exit with reason no case clause matching {noproc,{gen_server,call,[<0.889.0>,{command,epgsql_cmd_parse,{[],"SELECT * FROM \"app\" WHERE (\"app_id\" =  $1 ) LIMIT $2 OFFSET $3",[]}},infinity]}} in wpool_process:handle_call/3 line 189 in context child_terminated
09:37:27.845 [error] CRASH REPORT Process <0.1337.0> with 0 neighbours exited with reason: timeout in wpool_pool:call_available_worker/3 line 126 in wpool_pool:call_available_worker/3 line 126
09:37:27.845 [error] Cowboy stream 1 with ranch listener http and connection process <0.1336.0> had its request process exit with reason: timeout in wpool_pool:call_available_worker/3 line 126

cabol commented 4 years ago

As far as I can see this is a Postgres adapter's issue, it should handle when there is a connection error, and maybe forcing the pool to restart and/or reconnect at the moment the error occurs.

Perhaps we could handle this case on the handle_info and/or terminate callbacks within the sumo_db_pgsql backend trapping the exit signals related to the connection's PID.

A question, just to understand a bit more what is going on, after you got the crash error, was the connection restored? I'm asking because after the crash the supervision tree for the pool should have been restarted, hence, the connections should have been restored. Is that the case? or the connections were never restored?

asyncmind0 commented 2 years ago

This is not happening anymore, I'm not entirely sure what resolved it :shrug:

inaka / sumo_db

sumo_pgsql gen_server <0.905.0> terminated with reason: sock_closed #336