turt2live / matrix-voyager-bot

A Matrix bot that attempts to travel the whole network, finding rooms along the way
GNU General Public License v3.0
45 stars 5 forks source link

Voyager struggling under large rooms #84

Closed turt2live closed 6 years ago

turt2live commented 7 years ago

It got asked to join #ubuntu in freenode, and now it's torturing all available resources.

It stopped processing sync updates due to the large room join. Eventually the homeserver failed under the load and caused some propagating failures to voyager:

/sync error Unknown error code: Unknown message
{ [Unknown error code: Unknown message]
  errcode: undefined,
  name: 'Unknown error code',
  message: 'Unknown message',
  data: '<html>\r\n<head><title>504 Gateway Time-out</title></head>\r\n<body bgcolor="white">\r\n<center><h1>504 Gateway Time-out</h1></center>\r\n<hr><center>nginx/1.10.0 (Ubuntu)</center>\r\n</body>\r\n</html>\r\n',
  httpStatus: 504 }
Starting keep-alive
info VoyagerBot Sync state: SYNCING -> RECONNECTING
info VoyagerBot Processing 0 pending node updates. 0 remaining
info VoyagerBot Processed 0 node updates. 0 remaining
info VoyagerBot Processing 0 pending node updates. 0 remaining
info VoyagerBot Processed 0 node updates. 0 remaining
info VoyagerBot Sync state: RECONNECTING -> ERROR
ERR! VoyagerBot { error: 
ERR! VoyagerBot    { [ORG.MATRIX.JSSDK_TIMEOUT: Locally timed out waiting for a response]
ERR! VoyagerBot      errcode: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot      name: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot      message: 'Locally timed out waiting for a response',
ERR! VoyagerBot      data: 
ERR! VoyagerBot       { error: 'Locally timed out waiting for a response',
ERR! VoyagerBot         errcode: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot         timeout: 15000 } } }
info VoyagerBot Processing 0 pending node updates. 0 remaining
info VoyagerBot Processed 0 node updates. 0 remaining
info VoyagerBot Sync state: ERROR -> ERROR
ERR! VoyagerBot { error: 
ERR! VoyagerBot    { [ORG.MATRIX.JSSDK_TIMEOUT: Locally timed out waiting for a response]
ERR! VoyagerBot      errcode: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot      name: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot      message: 'Locally timed out waiting for a response',
ERR! VoyagerBot      data: 
ERR! VoyagerBot       { error: 'Locally timed out waiting for a response',
ERR! VoyagerBot         errcode: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot         timeout: 15000 } } }
info VoyagerBot Processing 0 pending node updates. 0 remaining
info VoyagerBot Processed 0 node updates. 0 remaining
info VoyagerBot Processing 0 pending node updates. 0 remaining
info VoyagerBot Processed 0 node updates. 0 remaining
info VoyagerBot Sync state: ERROR -> ERROR
ERR! VoyagerBot { error: 
ERR! VoyagerBot    { [ORG.MATRIX.JSSDK_TIMEOUT: Locally timed out waiting for a response]
ERR! VoyagerBot      errcode: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot      name: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot      message: 'Locally timed out waiting for a response',
ERR! VoyagerBot      data: 
ERR! VoyagerBot       { error: 'Locally timed out waiting for a response',
ERR! VoyagerBot         errcode: 'ORG.MATRIX.JSSDK_TIMEOUT',
ERR! VoyagerBot         timeout: 15000 } } }

This is voyager's monitoring: image

This is the homeserver's basic monitoring: image

Prometheus metrics are not available for this incident. However, voyager did run up against the resource limits of it's Toronto node: image

After voyager recovered, it queued an update of ~105,359 nodes (because it tries to cache member information). This took quite a while to process and ended up with some significant traffic to the database, monopolizing the connection pool.

Voyager is currently stable (as of writing this), however it is still chugging through ~80k node updates.

turt2live commented 6 years ago

Improved performance in the typescript version should fix this