jaracil / nexus

Distributed RPC system
Other
11 stars 2 forks source link

Node got killed although the deadline had not expired #38

Open pho opened 6 years ago

pho commented 6 years ago

This has happened only once, in a single RethinkDB instance with two Nexus connected, which had been working fine for quite some time

INFO[2018/06/01 03:07:52.472016 +0000] Killing node                                  deadline="2018-06-01 03:07:49.765 +0000 +00:00" killed=3d4d0702 node=581fdd32 now="2018-06-01 03:07:51.618 +0000 +00:00" type=system
ERRO[2018/06/01 03:07:52.527226 +0000] Ouch!, I've been killed                       node=581fdd32 stored deadline="2018-06-01T03:07:59.098Z" time of last deadline="2018-06-01 03:07:51.617512962 +0000 UTC m=+763555.467865866" type=system
ERRO[2018/06/01 03:07:52.531875 +0000] Daemon exit                                   cause="node tracker exit" node=581fdd32 type=system

INFO[2018/06/01 03:07:52.465033 +0000] Killing node                                  deadline="2018-06-01 03:07:59.098 +0000 +00:00" killed=581fdd32 node=3d4d0702 now="2018-06-01 03:07:48.958 +0000 +00:00" type=system
ERRO[2018/06/01 03:07:52.527224 +0000] Ouch!, I've been killed                       node=3d4d0702 stored deadline="2018-06-01T03:07:49.765Z" time of last deadline="2018-06-01 03:07:48.955943514 +0000 UTC m=+763545.252729261" type=system
ERRO[2018/06/01 03:07:52.533183 +0000] Daemon exit                                   cause="node tracker exit" node=3d4d0702 type=system

The node 3d4d0702 decided to kill 581fdd32 but the deadline had not expired. This check is made in a single rethinkdb update here: https://github.com/jaracil/nexus/blob/master/nodes.go#L99