senecajs / seneca-mesh

Mesh your Seneca.js microservices together - no more service discovery!
MIT License
142 stars 47 forks source link

DEATH LOOP: when using mesh with isbase:true and multiple pins! #97

Open jeromevalentin opened 7 years ago

jeromevalentin commented 7 years ago

With versions:

    "seneca": "3.4.2",
    "seneca-balance-client": "0.6.1",
    "seneca-mesh": "0.11.0"

The following code

// seneca standard init
// declaring the mesh plugin to join the party!
seneca.use('mesh', {
  isbase: true,
  listen: [
    { pin: 'role:A' },
    { pin: 'role:B' }
  ]
})

leads to the following error:

Error: [DEATH LOOP]
at Seneca.die
at Object.intern.handle_reply 
at Object.act_tm [as ontm] 
at Timeout.timeout_check [as _onTimeout] 
at ontimeout (timers.js:488:11)
at tryOnTimeout (timers.js:323:5)
at Timer.listOnTimeout (timers.js:283:5)
pola88 commented 7 years ago

Any luck with this?? I have the same problem

blanchma commented 7 years ago

@rjrodger we have the same problem but we don't have isBase and multiple pins in the same place. Our base is simpler:

seneca.use('mesh', {  isbase: true,
                      auto: true,
                      port: process.env.SENECA_PORT || 7000,
                      bases: seneca_bases
                   });

Our current packages are seneca: 3.4.2, seneca-balance-client: 0.6.1 and seneca-mesh: 0.11.0

rodmaz commented 7 years ago

Are you using Docker containers? This seems like a networking issue we are having here when using Docker containers.

jeromevalentin commented 7 years ago

I'm not using docker, just starts seneca with isbase true and multiple pins in listen options leads to the DEATH LOOP

rodmaz commented 7 years ago

@jeromevalentin Do you have other bases listed in seneca_bases? Does the error occur when you remove bases: seneca_bases? Usually these DEATH_LOOPs occur when your client cannot communicate with your base due to networking limitations (no multicast support, ports blocked etc).

blanchma commented 7 years ago

I have the same problem. The base wasn't using the host option and the network didn't support multicast

On Sep 10, 2017, at 10:13, Rodrigo Mazzilli notifications@github.com wrote:

@jeromevalentin Do you have other bases listed in seneca_bases? Does the error occur when you remove bases: seneca_bases? Usually these DEATH_LOOPs occur when your client cannot communicate with your base due to networking limitations (no multicast support, ports blocked etc).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

jeromevalentin commented 7 years ago

@rodmaz No other bases, no other mesh process is running, just trying to start a 'base' mesh process supporting multiple microservices.

In testmesh.es6, write

import Seneca from 'seneca'

let seneca = Seneca({ tag: 'mesh' })

seneca.add({ role: 'A' }, (args, done) => done(null, { A: args }))
seneca.add({ role: 'B' }, (args, done) => done(null, { B: args }))

seneca.use('mesh', {
  isbase: true,
  listen: [
    { pin: 'role:A' },
    { pin: 'role:B' }
  ]
})
seneca.ready(() => {
  console.log('Ready')
  seneca.act({ role: 'A', param: 'foobar' }, (err, resp) => console.log(err, resp))
})

then transpiles it with babel, and execute:

$> node testmesh.js
{"code":"EADDRINUSE","errno":"EADDRINUSE","syscall":"bind","address":"0.0.0.0","port":39999,"level":"warn","actid":"qoem7dimej4c/0bsvrvuozh8q","plugin_name":"mesh","pattern":"init:mesh","seneca":"l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","when":1505052018525}
{"notice":"seneca: Action name:mesh,plugin:define,role:seneca,seq:2,tag:undefined failed: [TIMEOUT].","code":"act_execute","err":{"eraro":true,"orig":{},"code":"act_execute","seneca":true,"package":"seneca","msg":"seneca: Action name:mesh,plugin:define,role:seneca,seq:2,tag:undefined failed: [TIMEOUT].","details":{"message":"[TIMEOUT]","pattern":"name:mesh,plugin:define,role:seneca,seq:2,tag:undefined","instance":"Seneca/l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","orig$":{},"message$":"[TIMEOUT]","plugin":{}},"callpoint":"at Object.act_tm [as ontm] (/home/jvalentin/project/node_modules/seneca/seneca.js:917:52)"},"actid":"qoem7dimej4c/0bsvrvuozh8q","msg":{"role":"seneca","plugin":"define","name":"mesh","seq":2,"default$":{},"fatal$":true,"local$":true,"plugin$":{"name":"transport","tag":"-","fullname":"transport"}},"meta":{"start":1505052009975,"end":1505052032227,"pattern":"name:mesh,plugin:define,role:seneca,seq:2,tag:undefined","action":"plugin_definition_10","mi":"qoem7dimej4c","tx":"0bsvrvuozh8q","id":"qoem7dimej4c/0bsvrvuozh8q","instance":"l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","tag":"mesh","seneca":"3.4.2","version":"0.1.0","gate":false,"fatal":true,"local":true,"timeout":22222,"dflt":{},"plugin":{"name":"transport","tag":"-","fullname":"transport"},"parents":[],"sync":false,"trace":[{"desc":["name:balance_client,plugin:define,role:seneca,seq:3,tag:mesh~mluctf","ejmlatdu9ll4/0bsvrvuozh8q","l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","mesh","0.1.0",1505052010001,1505052010005,false,"plugin_definition_23"],"trace":[{"desc":[null,"lyxzq82zjc5r/0bsvrvuozh8q","l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","mesh","0.1.0",1505052010004,1505052010005,true,null],"trace":[]}]}],"sub":null,"data":null,"err":null,"err_trace":null,"error":true,"empty":null},"actdef":{"plugin_name":"mesh","plugin_tag":"-","plugin_fullname":"mesh","raw":{"role":"seneca","plugin":"define","name":"mesh","seq":2},"plugin":{"name":"mesh","tag":"-","fullname":"mesh"},"sub":false,"client":false,"rules":{},"id":"plugin_definition_10","name":"plugin_definition","pattern":"name:mesh,plugin:define,role:seneca,seq:2,tag:undefined","msgcanon":{"name":"mesh","plugin":"define","role":"seneca","seq":2,"tag":"undefined"},"priorpath":""},"client":false,"listen":false,"transport":{},"kind":"act","case":"ERR","duration":22252,"level":"error","plugin_name":"mesh","pattern":"name:mesh,plugin:define,role:seneca,seq:2,tag:undefined","seneca":"l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","when":1505052032230}
[]
{"notice":"seneca: Action init:mesh failed: [TIMEOUT].","code":"act_execute","err":{"eraro":true,"orig":{},"code":"act_execute","seneca":true,"package":"seneca","msg":"seneca: Action init:mesh failed: [TIMEOUT].","details":{"message":"[TIMEOUT]","pattern":"init:mesh","instance":"Seneca/l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","orig$":{},"message$":"[TIMEOUT]","plugin":{}},"callpoint":"at Object.act_tm [as ontm] (/home/jvalentin/project/node_modules/seneca/seneca.js:917:52)"},"actid":"qoem7dimej4c/0bsvrvuozh8q","msg":{"init":"mesh","default$":{},"fatal$":true,"local$":true,"plugin$":{"name":"transport","tag":"-","fullname":"transport"},"tx$":"0bsvrvuozh8q"},"meta":{"start":1505052010002,"end":1505052032351,"pattern":"init:mesh","action":"init_24","mi":"8qivehq1pvt2","tx":"0bsvrvuozh8q","id":"8qivehq1pvt2/0bsvrvuozh8q","instance":"l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","tag":"mesh","seneca":"3.4.2","version":"0.1.0","gate":false,"fatal":true,"local":true,"timeout":22222,"dflt":{},"plugin":{"name":"transport","tag":"-","fullname":"transport"},"parents":[["name:mesh,plugin:define,role:seneca,seq:2,tag:undefined","qoem7dimej4c/0bsvrvuozh8q","l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","mesh","0.1.0",1505052009975,null,false,"plugin_definition_10"]],"sync":true,"trace":[{"desc":["cmd:listen,role:transport","tnuj8c3xueul/0bsvrvuozh8q","l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","mesh","0.1.0",1505052010453,1505052010462,true,"_29"],"trace":[{"desc":["cmd:listen,role:transport","mjmlnzz4kxgd/0bsvrvuozh8q","l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","mesh","0.1.0",1505052010454,1505052010457,true,"_12"],"trace":[{"desc":["hook:listen,role:transport,type:web","5eptqk276xj0/0bsvrvuozh8q","l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","mesh","0.1.0",1505052010455,1505052010457,true,"_16"],"trace":[]},{"desc":["hook:listen,role:transport,type:web","5eptqk276xj0/0bsvrvuozh8q","l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","mesh","0.1.0",1505052010455,1505052010457,true,"_16"],"trace":[]}]}]}],"sub":null,"data":null,"err":null,"err_trace":null,"error":true,"empty":null},"actdef":{"plugin_name":"mesh","plugin_tag":"-","plugin_fullname":"mesh","raw":{"init":"mesh"},"plugin":{"name":"mesh","tag":"-","fullname":"mesh"},"sub":false,"client":false,"rules":{},"id":"init_24","name":"init","pattern":"init:mesh","msgcanon":{"init":"mesh"},"priorpath":""},"client":false,"listen":false,"transport":{},"kind":"act","case":"ERR","duration":22349,"level":"error","plugin_name":"mesh","pattern":"init:mesh","seneca":"l1c2jbmuhpch/1505052009412/3380/3.4.2/mesh","when":1505052032352}
...

The error lets thinking mesh is trying to start the base listening on 39999 multiple times.

rodmaz commented 7 years ago

@jeromevalentin This error (EADDRINUSE is clear), you have other process running on port 39999. Try to reboot your machine or investigate which process is using that port. A quick test you can do is use another port number.

jeromevalentin commented 7 years ago

@rodmaz The error is clear, but no other process is using that port on my computer:

$> netstat -a | grep 39999
$>
karn09 commented 7 years ago

I was able to get this working after seeing similar issues.

Below code throws death loop error:

    .use("mesh", {
        isbase: true,
        listen: [{ pin: 'users:list' }, { pin: 'users:create' }]
    });

Instead, I used- which worked:

    .use("mesh", {
        isbase: true,
        pin: "users:*"
    });

Also, I found myself camelCasing isBase: true, it should be isbase: true. Using isBase, will cause the death loop error as well.

jeromevalentin commented 7 years ago

I agree, like this, it works ... but this is just a workaround which is not always applicable. In my previous example, it would require to use:

{
  isbase: true,
  pin: 'role:*'
}

and with such configuration, that process will receive all role requests .... while it is only able to address A & B So this is not a solution

karn09 commented 7 years ago

Good point. It appears the documentation may be wrong.

After reading over the code, I found this piece which iterates over the listen array, but does not appear to follow the shape of the object within the docs:

starting at line 171:

      function init() {
...
...
          _.each(listen, function(listen_opts) {
            if (options.host && null == listen_opts.host) {
              listen_opts.host = options.host
            }

            if ('@' === (listen_opts.host && listen_opts.host[0])) {
              listen_opts.host = rif(listen_opts.host.substring(1))
            }

            listen_opts.port = null != listen_opts.port
              ? listen_opts.port
              : function() {
                  return 50000 + Math.floor(10000 * Math.random())
                }

            listen_opts.model = listen_opts.model || 'consume'

            listen_opts.ismesh = true

            seneca.listen(listen_opts)
          })
...

But listen is defined within the outer scope on line 124 - which matches the docs:

    var listen = options.listen || [
      { pin: pin, model: options.model || 'consume' }
    ]

Fortunately, there is an undocumented pins option, which appears to work for the most part.

    .use("mesh", {
        isbase: true,
        pins: [{
          users: 'list'
        }, {
          users: 'create'
        }],
        monitor: true
    });

However, if I use this configuration and try to call a service that does not exist, the mesh client crashes, for example

// client.js
const seneca = Seneca().use('mesh');
seneca.act({ users: 'nada' }, (err, result) => {
       // do stuff
})

// server.js
const seneca = Seneca().add('usersPlugin').use('mesh', { isbase: true })

results in:

[1] {"err":{},"level":"warn","seneca":"gqzge20hstfa/1505946828319/2923/3.4.2/-","when":1505946845630}
[1] Debug: internal, implementation, error
[1]     TypeError: Uncaught error: Cannot convert undefined or null to object
[1]     at toString (<anonymous>)
[1]     at objectToString (internal/util.js:18:36)
[1]     at Object.isError (internal/util.js:14:10)
[1]     at Object.act_fn [as fn] (/Users/johnnieves/workspace/winwin/foundation-node-postgres/node_modules/seneca/seneca.js:910:25)
[1]     at Immediate.processor [as _onImmediate] (/Users/johnnieves/workspace/winwin/foundation-node-postgres/node_modules/gate-executor/gate-executor.js:136:14)
[1]     at runCallback (timers.js:781:20)
[1]     at tryOnImmediate (timers.js:743:5)
[1]     at processImmediate [as _immediateCallback] (timers.js:714:5)
[1] 170920/223405.617, [response,api,users] http://jniev.home:5000: post /users {} 500 (6259ms)
[1] Error: async hook stack has become corrupted (actual: 1367, expected: 351)
[1]  1:
[1] node::AsyncWrap::MakeCallback(v8::Local<v8::Function>, int, v8::Local<v8::Value>*) [/usr/local/bin/node]
[1]  2:
[1] node::(anonymous namespace)::TimerWrap::OnTimeout(uv_timer_s*) [/usr/local/bin/node]
[1]  3:
[1] uv__run_timers [/usr/local/bin/node]
[1]  4:
[1] uv_run [/usr/local/bin/node]
[1]  5:
[1] node::Start(v8::Isolate*, node::IsolateData*, int, char const* const*, int, char const* const*) [/usr/local/bin/node]
[1]  6:
[1] node::Start(uv_loop_s*, int, char const* const*, int, char const* const*) [/usr/local/bin/node]
[1]  7:
[1] node::Start(int, char**) [/usr/local/bin/node]
[1]  8:
[1] start [/usr/local/bin/node]
[1] [nodemon] app crashed - waiting for file changes before starting...

Perhaps this is because I'm using this with Hapi/Chairo, but definitely did not expect this to take down my entire API server if a microservice is found to be unavailable.

beverlycodes commented 6 years ago

This might belong as a separate issue, but it seems like a DEATH LOOP should be something that can be voluntarily handled by a service. My services do more than just Seneca call-and-response, and should be allowed to remain running even if seneca-mesh is having a bad time. I'd rather be able to periodically retry finding and joining the mesh than have to constantly restart my service on a DEATH LOOP.

blanchma commented 6 years ago

Agree with @ryanfields. Death Loop should be an error code different from act_execute and handleable

cwilso03 commented 6 years ago

Building off of @karn09's post above:

I was recently having this problem too in an app, and finally figured out that the seneca-mesh pins option only works correctly if you give it an array of jsonic strings, not objects. So, the following works (whether on base node or not):

.use("mesh", {
        isbase: true,
        pins: ["users:list", "users:create"],
    });

but the following does not:

.use("mesh", {
        isbase: true,
        pins: [{
          users: 'list'
        }, {
          users: 'create'
        }]
    });

I put together a stripped down test project that uses seneca-web and seneca-mesh, with 4 stupid-simple microservices, two hosted on the base node and two hosted on a separate mesh node. The attached zip has the source and a short README.md that describes how to use it:

seneca-test.zip

danielo515 commented 6 years ago

to me the error vanish if I set the isbase to lower case. If I remove the isbase then the error comes out, but that may be due to being one single node without any other base around. In any case, I can confirm that defining the pin as a string and using isbase all lower case it works:

const initialSenecaConfig = {
  auto: true,
  isbase:true,
  listen: [
    { pin: "role:profile,command:*", model: "consume" }
  ],
  discover: {
    rediscover: true,
    custom: {
      active: true,
      find: dnsSeed
    }
  }
};
danielo515 commented 6 years ago

The error reported by @jeromevalentin about the EADDRINUSE is real. If you try to listen to several pines, using the format specified by the docs:

listen [
  {  pin: {role:'stuff', cmd:'a'} }
  {  pin: {role:'stuff', cmd:'b'} }
]

Then seneca-mesh tries to start several times, hence the error of EADDRINUSE. Removing one of the "pins" from the list the error dissapears, so it is not another process listenning on that port, is exactly the same process trying to listen several times on the same port.