sharafian / the-metamorphosis

ILP connector that's just, like, stupidly fast
1 stars 0 forks source link

spontaneous connection loss #11

Open michielbdejong opened 7 years ago

michielbdejong commented 7 years ago

Seeing this in my logs:

[2017-07-14 09:04:02,100] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 1 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
broadcast 4 routes to test.east.
broadcast 4 routes to test.west.
broadcast 4 routes to g.dns.land.connector.
got rpc 9f655f34-c2cf-4bf6-a5b8-09a10677a4a3 get_limit g.dns.land.connector. {}
broadcast 4 routes to test.east.
broadcast 4 routes to test.west.
broadcast 4 routes to g.dns.land.connector.
broadcast 4 routes to test.east.
broadcast 4 routes to test.west.
broadcast 4 routes to g.dns.land.connector.
[2017-07-14 09:05:22,098] INFO [GroupCoordinator 0]: Preparing to rebalance group incomingTransfers with old generation 1 (__consumer_offsets-2) (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,106] INFO [GroupCoordinator 0]: Preparing to rebalance group authorizer with old generation 2 (__consumer_offsets-11) (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,102] INFO [GroupCoordinator 0]: Preparing to rebalance group responseHandler with old generation 1 (__consumer_offsets-7) (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,101] INFO [GroupCoordinator 0]: Preparing to rebalance group incomingRequests with old generation 1 (__consumer_offsets-30) (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,120] INFO [GroupCoordinator 0]: Preparing to rebalance group kafka-node-group with old generation 2 (__consumer_offsets-37) (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,121] INFO [GroupCoordinator 0]: Preparing to rebalance group rpcServer with old generation 1 (__consumer_offsets-46) (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,122] INFO [GroupCoordinator 0]: Stabilized group incomingTransfers generation 2 (__consumer_offsets-2) (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,124] INFO [GroupCoordinator 0]: Stabilized group incomingRequests generation 2 (__consumer_offsets-30) (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,123] INFO [GroupCoordinator 0]: Stabilized group responseHandler generation 2 (__consumer_offsets-7) (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,128] INFO [GroupCoordinator 0]: Stabilized group rpcServer generation 2 (__consumer_offsets-46) (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,128] INFO [GroupCoordinator 0]: Stabilized group authorizer generation 3 (__consumer_offsets-11) (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,132] INFO [GroupCoordinator 0]: Stabilized group kafka-node-group generation 3 (__consumer_offsets-37) (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,142] WARN Client session timed out, have not heard from server in 67146ms for sessionid 0x15d3cce4bd20014 (org.apache.zookeeper.ClientCnxn)
[2017-07-14 09:05:22,146] INFO Client session timed out, have not heard from server in 67146ms for sessionid 0x15d3cce4bd20014, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2017-07-14 09:05:22,235] INFO [GroupCoordinator 0]: Assignment received from leader for group incomingTransfers for generation 2 (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,237] INFO [GroupCoordinator 0]: Assignment received from leader for group responseHandler for generation 2 (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,239] INFO [GroupCoordinator 0]: Assignment received from leader for group rpcServer for generation 2 (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,248] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)
[2017-07-14 09:05:22,293] INFO [GroupCoordinator 0]: Assignment received from leader for group incomingRequests for generation 2 (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,295] INFO [GroupCoordinator 0]: Assignment received from leader for group authorizer for generation 3 (kafka.coordinator.group.GroupCoordinator)
[2017-07-14 09:05:22,301] INFO [GroupCoordinator 0]: Assignment received from leader for group kafka-node-group for generation 3 (kafka.coordinator.group.GroupCoordinator)
{ Error
    at ConnectionManager.cleanupPendingQueue (/app/node_modules/node-zookeeper-client/lib/ConnectionManager.js:189:46)
    at ConnectionManager.onSocketClosed (/app/node_modules/node-zookeeper-client/lib/ConnectionManager.js:274:10)
    at emitOne (events.js:96:13)
    at Socket.emit (events.js:191:7)
    at TCP._handle.close [as _onclose] (net.js:504:12)
  code: -4,
  name: 'CONNECTION_LOSS',
  path: undefined,
  message: 'Exception: CONNECTION_LOSS[-4]' }
{ Error
    at ConnectionManager.cleanupPendingQueue (/app/node_modules/node-zookeeper-client/lib/ConnectionManager.js:189:46)
    at ConnectionManager.onSocketClosed (/app/node_modules/node-zookeeper-client/lib/ConnectionManager.js:274:10)
    at emitOne (events.js:96:13)
    at Socket.emit (events.js:191:7)
    at TCP._handle.close [as _onclose] (net.js:504:12)
  code: -4,
  name: 'CONNECTION_LOSS',
  path: undefined,
  message: 'Exception: CONNECTION_LOSS[-4]' }
{ Error
    at ConnectionManager.cleanupPendingQueue (/app/node_modules/node-zookeeper-client/lib/ConnectionManager.js:189:46)
    at ConnectionManager.onSocketClosed (/app/node_modules/node-zookeeper-client/lib/ConnectionManager.js:274:10)
    at emitOne (events.js:96:13)
    at Socket.emit (events.js:191:7)
    at TCP._handle.close [as _onclose] (net.js:504:12)
  code: -4,
  name: 'CONNECTION_LOSS',
  path: undefined,
  message: 'Exception: CONNECTION_LOSS[-4]' }

events.js:163
      throw er; // Unhandled 'error' event
      ^
Error
    at ConnectionManager.cleanupPendingQueue (/app/node_modules/node-zookeeper-client/lib/ConnectionManager.js:189:46)
    at ConnectionManager.onSocketClosed (/app/node_modules/node-zookeeper-client/lib/ConnectionManager.js:274:10)
    at emitOne (events.js:96:13)
    at Socket.emit (events.js:191:7)
    at TCP._handle.close [as _onclose] (net.js:504:12)

I'll see if I can get more information about what happened, but seems like either a zookeeper error or a kafka error.

michielbdejong commented 7 years ago

This happened after the container had been up for roughly 24 hours. After I started the stopped container again, it started running again, although linked incorrectly from nginx. When I ran ILP_DOMAIN=gar.michielbdejong.com ILP_EMAIL=email@example.com docker-compose up -d instead, it worked. I'll see how long the container stays up this time.