vsivsi / meteor-job-collection

A persistent and reactive job queue for Meteor, supporting distributed workers that can run anywhere.
https://atmospherejs.com/vsivsi/job-collection
Other
388 stars 68 forks source link

Auto reconnect does not seem to work when DDP client disconnects from server #197

Closed danielparas closed 7 years ago

danielparas commented 7 years ago

Hi, not sure where to post this (meteor-job-collection or node-ddp-client) but I'm having trouble with a my pure node workers. If the connection with the job server is lost, for whatever reason (eg. job server deploy or restart), the following error is thrown and the worker never reconnects, despite having the following config for node-dpp-client.

  autoReconnect:        true
  autoReconnectTimer:   600

Error thrown:

JobQueue: Received error from getWork():  Error: DDPClient: Disconnected from DDP server
ktworker-8   at Object.<anonymous> (/var/app/ktworker/node_modules/ddp/lib/ddp-client.js:58:17)
ktworker-8   at Module._compile (module.js:541:32)
ktworker-8   at Object.Module._extensions..js (module.js:550:10)
ktworker-8   at Module.load (/var/app/ktworker/node_modules/coffee-script/lib/coffee-script/register.js:45:36)
ktworker-8   at tryModuleLoad (module.js:417:12)
ktworker-8   at Function.Module._load (module.js:409:3)
ktworker-8   at Function._load (/usr/lib/node_modules/pm2/node_modules/pmx/lib/transaction.js:62:21)
ktworker-8   at Module.require (module.js:468:17)
ktworker-8   at require (internal/module.js:20:19)
ktworker-8   at Object.<anonymous> (/var/app/ktworker/src/main.coffee:16:7)
ktworker-8   at Object.<anonymous> (/var/app/ktworker/src/main.coffee:5:1)
ktworker-8   at Module._compile (module.js:541:32)
ktworker-8   at Object.loadFile (/usr/lib/node_modules/pm2/node_modules/coffee-script/lib/coffee-script/register.js:16:19)
ktworker-8   at Module.load (/usr/lib/node_modules/pm2/node_modules/coffee-script/lib/coffee-script/register.js:45:36)
ktworker-8   at tryModuleLoad (module.js:417:12)
ktworker-8   at Function.Module._load (module.js:409:3)
ktworker-8   at Function._load (/usr/lib/node_modules/pm2/node_modules/pmx/lib/transaction.js:62:21)
ktworker-8   at /usr/lib/node_modules/pm2/lib/ProcessContainer.js:217:23
ktworker-8   at /usr/lib/node_modules/pm2/node_modules/async/lib/async.js:52:16
ktworker-8   at /usr/lib/node_modules/pm2/node_modules/async/lib/async.js:1209:30
ktworker-8   at WriteStream.<anonymous> (/usr/lib/node_modules/pm2/lib/Utility.js:131:13)
ktworker-8   at emitOne (events.js:96:13)
ktworker-8   at WriteStream.emit (events.js:188:7)
ktworker-8   at WriteStream.<anonymous> (fs.js:1843:10)
ktworker-8   at FSReqWrap.oncomplete (fs.js:117:15)

Any ideas what can be done to solve this - maybe I'm missing something.

Thanks in advance! Daniel

vsivsi commented 7 years ago

That's a node DDP issue. There's no code in this repo that touches on the DDP connection. You should look over there (IIRC, this issue has been raised there before, so check the open/closed issues for discussion to see what has been done previously.) My recollection is that you are responsible in some cases for initiating new connections when they drop. The DDP package's autoReconnect isn't a cure-all (it doesn't retry endlessly or implement a backoff strategy, or...)

You can listen to various events and implement your own reconnection strategy. For example by implementing different behavior for the onError and onClose functions in this sample app. https://github.com/vsivsi/meteor-job-collection-playground-worker/blob/master/work.coffee#L105-L118

But if you want a more detailed discussion, please do so on the DDP repo.

danielparas commented 7 years ago

thanks @vsivsi ! I have implemented code as suggested in the sample app but it doesn't seem to catch the disconnect for some reason. Will see what I can find/ask on node DDP.