senecajs / seneca

A microservices toolkit for Node.js.
http://senecajs.org
MIT License
3.95k stars 314 forks source link

Transport V3 Roundtable #399

Open mcdonnelldean opened 8 years ago

mcdonnelldean commented 8 years ago

This issue is a place to hash out the message spec and other salient details of transport.

The proposed spec is here: https://github.com/senecajs/seneca/blob/master/doc/transport.md

Current Version Samples

Inbound messages

role:github,cmd:get

{
  "role": "npm",
  "cmd": "get",
  "name": "hapi",
  "transport$": {
    "track": [],
    "origin": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36",
    "time": {
      "client_sent": 1461086855637
    }
  },
  "ungate$": true,
  "tx$": "72t2ysuc8rae",
  "meta$": {
    "id": "mk05ddwzcqex/72t2ysuc8rae",
    "tx": "72t2ysuc8rae",
    "start": 1461086855638,
    "pattern": "cmd:get,role:npm",
    "action": "(s5iy3du8vb40)",
    "entry": true,
    "chain": [],
    "sync": true,
    "sub": "role:npm"
  },
  "in$": true
}
mcdonnelldean commented 8 years ago

My initial comment on all of this is thus,

I don't want to get dragged down a specification bikeshed. We can iterate over time, let's keep it practical to implement but useful to consume before we start adding too much to it.

mcdonnelldean commented 8 years ago

Point 2. Is there any chance of dropping the $ and just dropping a level. This gives us tons more structure to play with and gives us the concept of header / payload.

rjrodger commented 8 years ago

@mcdonnelldean Au contrary Rodney!

Protocol design is one of those things to think hard about from day one (which I did not do! and we're paying for it now).

@mcollina One of the major drivers for this meta data is to enable context independent debugging. You should be able to determine the message flow leading up to any given message. This has been a big issue in practice.

In terms of message size, it's hard to say what the size distribution looks like. I'm going to suggest 4Kb as the median - let's design against this. In that case, this level of meta data is < 1% of 4Kb so not impactful.

This is a data structure expressed in JSON, not a JSON document format per se. Ultimately it is used by plugins as a js object data structure - the aim is to give plugins a common data structure.

Over the network, data may be encoded differently - e.g. HTTP POSTs submitting raw JSON, with the meta data in headers

mcdonnelldean commented 8 years ago

@rjrodger Let's just leave this here :D

I'm a fan no doubt but I would like to ensure we don't end up with the seneca-transport-spec-working-group is all.

mcollina commented 8 years ago

So, here are my notes:

  1. the metadata being moved it is a lot. Most of this can be implied because of the medium being used (direct TCP?). I would prefer to say that generally a seneca message has this metadata, rather than a JSON spec. Also, a more direct explanation on why each field is needed would be extremely helpful
  2. I am ok with the given JS representation of the message, but some of the stuff might be encoded in HTTP headers, or doing an handshake. Also, we might not encode it as JSON in the first place. So, I propose we reword "JSON document" to "JS object": a string or a number is not a message.
  3. act always true? what are the options?
  4. let's use a version number in the document, let's call it v1-dev, and then we can finalize it, we might even include it in the messages.
  5. we should talk more about the identifiers before choosing one: generating different types of ids have different performance penalties. We do not want to loose time generating IDs. There is also a cost of sending long ids through the pipe.
  6. the whole document is set up that all of that conversation between services happens for everything but memory. I think we should reverse the discussion, there is only one specification of messages, with some properties that might be omitted (for speed) within certain boundaries (process and the like).
  7. once this is finalized, we should implement dummy go-seneca or java-seneca interop layers, just to prove that our protocol works outside of our realm. If I can, I might even write an alternative implementation in node. This should prove our point that this is a specification, we should aim to standardize this and call it in some defined way.
  8. out of scope for the current doc but important: specification of HTTP and TCP over the wire protocols.
  9. regarding the trk data, is this already there? how it is exposed? Can I intercept it?
mcollina commented 8 years ago

In terms of message size, it's hard to say what the size distribution looks like. I'm going to suggest 4Kb as the median - let's design against this. In that case, this level of meta data is < 1% of 4Kb so not impactful.

@rjrodger I agree, we should design and measure. @mcdonnelldean can you pop by some message exchange from NodeZoo/Vidi/* so we can be inspired and make decision on top of this? We should really make examples for this stuff anyway.

The question regarding metadata is highly important, because currently we are using seneca.act() for interacting with the transports/entities as well. In theory, all of those are microservices, so the flow chain might get really big, and the overhead grow. How all of this plays off with this metadata?

mcdonnelldean commented 8 years ago

@mcollina I can post some of these up later, great idea. On your other question. I think it might be useful for @rjrodger to spec out how correlation should happen, this is ultimately what most of the meta is for, based on examples plus a flow we would be in better shape to discuss changes.

rjrodger commented 8 years ago

@mcollina those transport and entity actions are out-of-band setup messages - ie one time on init. For normal messages, the flow chain should not be that deep.

It does raise the question of having a TTL as a safeguard against loops or deep chains (both of which should be avoided in practice)

mcollina commented 8 years ago

@rjrodger I agree on TTL, that mechanism need to be part of the spec.

Moving these metadata across messages should be done by the framework, but there should be an easy way I could get around this too (if I want to restart the chain).

rjrodger commented 8 years ago

@mcdonnelldean ah that's actually the current structure - which is an envelope model - which is what we want to get away from. The problem with an envelope is that you're imposing structure which makes interop harder. With a namespaced internal property you can accept pretty much any old JSON doc - much easier to integration work

mcollina commented 8 years ago

This is a data structure expressed in JSON, not a JSON document format per se. Ultimately it is used by plugins as a js object data structure - the aim is to give plugins a common data structure.

You ok with me rephrasing this as a data structure which sits within the message rather than a JSON document? I'll send a PR then.

The problem with an envelope is that you're imposing structure which makes interop harder. With a namespaced internal property you can accept pretty much any old JSON doc - much easier to integration work.

My point of view is that we shouldn't really care if we do envelop model or something else. That should be part of the protocol "on the wire": let's focus on the data structures and the interaction models. I might want to put the metadata in a header because of raw performance, using protobuf for the headers and msgpack for the content. I agree with Richard that putting it in a property make things easier and I think it's a must when the message enters the framework, but I think we might omit most of these for in-memory calls (because of performance).

mcdonnelldean commented 8 years ago

@rjrodger I hadn't considered it this way. Reading through more thoroughly I hadn't considered the portion explaining how it is constructed.

@mcollina I will defer to Richard on yourself on this one, so far I have no major opinion outside of disliking using $ for namespacing.

mcdonnelldean commented 8 years ago

@mcollina I'll be updating the initial posts with samples as I get them

rjrodger commented 8 years ago

@mcollina that's easy - this meta data is never used for calls inside process - it's for over the wire comms

@mcdonnelldean that's why $ is great for namespacing internal mechanics - no one uses it by choice as yuck :)

mcollina commented 8 years ago

@mcollina that's easy - this meta data is never used for calls inside process - it's for over the wire comms

I'm not 100% convinced: I think tracking all the relevant bits are important as in the microservice-to-microservice comms, not just across hosts. Moreover, we might lose important pieces of information if this is not present every time (like tracking ids).

mcdonnelldean commented 8 years ago

@rjrodger I can't really argue that point any further.

@mcollina I'll have intra process metas up for you tomorrow so you have a clearer picture of whats there now

Kindest Regards,

Dean

On 19 Apr 2016, at 19:01, Matteo Collina notifications@github.com wrote:

@mcollina that's easy - this meta data is never used for calls inside process - it's for over the wire comms

I'm not 100% convinced: I think tracking all the relevant bits are important as in the microservice-to-microservice comms, not just across hosts. Moreover, we might lose important pieces of information if this is not present every time (like tracking ids).

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

rjrodger commented 8 years ago

@mcdonnelldean dude! Just getting warmed up here https://m.youtube.com/watch?v=HjjDOdaFZg0

mcdonnelldean commented 8 years ago

Ha don't worry there will be plenty more to put heads on :D

pelger commented 8 years ago

@rjrodger @mcdonnelldean @mcollina Just catching up on this thread.

1). Need to add the protocol version number to the metadata. To be clear this is the protocol version and not the seneca version and will vary independently.

2). A TTL field is an efficient way of implementing loop avoidance so I tend to agree with this point. This requires careful implementation and consideration needs to be given to the behaviour on dropping a message and also to the default TTL value.

3). rather than act / res perhaps it would be better to specify a 'type' field leaving things more open for extension.

4). @mcollina agree that creating interop implementations in other languages is a good idea. I would select Java and C# as the candidate languages as this will help provide support for legacy systems integration.

5). In terms of implementation we need to select a limited set of 'blessed' transports. Most likely these are http, RabbitMQ, SQS?

mcdonnelldean commented 8 years ago

@pelger Both @naomifeehanmoran and @AdrianRossouw are working on trimming down the volume of plugins we currently support. Let's codify the transport list today so we know what ones are considered 'blessed'.

rjrodger commented 8 years ago

@pelger

version number +1 also redis - that gives you pub/sub "type" should be "kind", if you recall, or now "knd" :) - that said, there is the "usr" section for extensions, so not sure

mcollina commented 8 years ago

@rjrodger can you please articulate more about redis for pub/sub?

To the best of my knowledge, there is no (current) way of subscribing to a broker with a seneca pattern. So, pubsub is limited to broadcast, or fixed patterns. I will limit pub/sub only to the "fire and forget" semantic. Because it can have 0-unlimited answers.

I think we should support three use cases:

  1. memory, I would consider this the "basic" transport
  2. request/response with http
  3. publish/subscribe with redis and rabbitmq (we might want to use https://github.com/mcollina/mqemitter for easy pub/sub portability)
  4. queue with redis, rabbitmq and sqs

Optionally, I think we should add streaming based on TCP + https://github.com/mcollina/tentacoli.

how does this discussion fit in with seneca-mesh?

mcdonnelldean commented 8 years ago

@mcollina Request Response needs to work over tcp only too (no http) we have community members using this, I'll root out the issue.

I would love to see streaming via tcp if it can be added in such a way that doesn't break the world.

Mesh shouldn't impact as such, it is only using the swim protocol to figure out what is where. It defers to seneca-balance-client, which defers to a given transport. It is not a transport in and of itself. Mesh should work with any transport (Should).

In terms of breakage. I just want to be super clear here. We can't break the world. The red line will be Nodezoo needs to work with version 3 with no code changes outside of loading or not loading plugins. We basically need to stack changes so the last major version is always supported with without breaking the world. Bear in mind we do 4 major releases a year so this support level is not hard to achieve.

We'll most likely use Nodezoo as our breaking sanity check and Nodezoo v.next as our proof of implementation to the community.

From May 3rd @mirceaalexandru will be taking over NodeZoo with a small team, he will be able to sanity check changes as you need them to keep ye all from having to go implement the work somewhere.

@mcollina On pubsub, you are correct, it is really a full fleshed fire and forget mechanic that people want. Right now the issue is you can only get this over particular transports and not local. I don't see why pubsub can't be everywhere, sure you can add rabbitmq and others but Nodezoo is using this sort of functionality right now with no hard storage, lets not lose that.

One functionality we have seen asked for (although I'm dubious about) is one request, many responses using a single pattern. This would be request, multi response. I'm not sure if this comes from the fact that pubsub is a second class citizen or if it would add actual value. I personally wouldn't use it and would be more inclined to do classic fire and forget over two patterns instead. But i said I would raise it.

I'm half way through a map for nodezoo so you can see the direction and transports used. As soon as it is ready I will add it.

mcollina commented 8 years ago

In terms of breakage. I just want to be super clear here. We can't break the world. The red line will be Nodezoo needs to work with version 3 with no code changes outside of loading or not loading plugins. We basically need to stack changes so the last major version is always supported with without breaking the world. Bear in mind we do 4 major releases a year so this support level is not hard to achieve.

Talking about breakage: one thing is to break at API-level, one thing is to break at protocol level. So, is it expected that a Seneca v3 can talk with a Seneca v4? Changing the protocols will break things. I argue no, a Seneca v3 instance would not talk with a Seneca v4. After we break the protocol, we can support it through more releases. That's why we need a version number.

On pubsub, you are correct, it is really a full fleshed fire and forget mechanic that people want. Right now the issue is you can only get this over particular transports and not local. I don't see why pubsub can't be everywhere, sure you can add rabbitmq and others but Nodezoo is using this sort of functionality right now with no hard storage, lets not lose that.

I think pubsub needs to be a first class citizen both at API and transport levels. We might aim for supporting this in late 2016, but we need to design the data structures so that it is possible.

One functionality we have seen asked for (although I'm dubious about) is one request, many responses using a single pattern. This would be request, multi response. I'm not sure if this comes from the fact that pubsub is a second class citizen or if it would add actual value. I personally wouldn't use it and would be more inclined to do classic fire and forget over two patterns instead. But i said I would raise it.

I'm really against this. You can predict how many services will reply, making this kind of setup unreliable. We could probably support this with a "streaming api", but I think we have other priorities.

mcdonnelldean commented 8 years ago

@mcollina V3 and V4 need to talk, as do V4 and V5 but not V5 and V3. Single major backwards is the deal. This may mean we have to build in some negotiation logic or you may need to add a plugin to support negotiation. What we did with Entities was to just wrap it all in a plugin and pull it out, if you need back compat you load the plugin. I'm fine with this as a 'code change' as it's a compositional concern.

From a customer point of view there is no way the community will stand for us rewriting transports over a single version with no back compat. This would draw the fires for fury and is a red line.

Having said that, let's not fret too much on this. It's an implementation concern like any other and doesn't change the work that can be done, just the order in which it's done.

Realistically Transport is supposed to be fully pluggable (which it's not) this might be the first item of work. Unhook transport internally and it all the issues go away since we can offer it as a transport swap.

We must support: http://senecajs.org/contribute/principles.html

Agreed on all your other comments.

pelger commented 8 years ago

@rjrodger - agreed on 'knd'

mcdonnelldean commented 8 years ago

Via an issue on the beanstalk transport


@mcdonnelldean I read the transport protocol spec (https://github.com/senecajs/seneca/blob/master/doc/transport.md). I understand the purpose and scope of document, greatly summarized in the header:

The protocol is a request/response model. However some requests do not require responses, so the protocol also supports actor or pub/sub message flows. I wish to see something about the message handle in case of errors/failures, because Seneca is distributed and well, shit happens. Seneca should be clear about the expected behavior when things goes wrong, a lot of assumptions are actually totally plugins dependent (timeouts and fatal$ handling, retry, persistence and so on). Here, my basic assumption is that every transports should support "volatile" messages first: a message is lost when the receiver is off or timeouts or error. Errors and fatal$ should be handled by the caller (the code who calls seneca.act()).

Queue like behavior, retry, persistence, fatal$ special handling should be optional and transport dependent, handled by options during the initial setup or handled in way similar to native$ for entities.


https://github.com/rjrodger/seneca-beanstalk-transport/issues/8#issuecomment-218992489

mcdonnelldean commented 8 years ago

Timeouts also need a think. Consider the code and output in issue #385

mcollina commented 8 years ago

I do agree 100%. However, fatal$ and timeouts are already 100% removed in #402. That PR needs to add back support for timeouts. I think those should be transport specific (currently there are two timeouts, one on the caller and one on the receiver).

boneskull commented 8 years ago

I'm running into an issue where I have a long-running action (paginating event data from GitHub in a non-evil way)--I'm attempting to use timeout when instantiating Seneca, and also timeout$ in the action arguments, as seen here.

I set this to a very large integer, because it's unclear how long the action will actually take. Infinity is unsupported, and 0 causes a fatal error (since the action will timeout near-immediately).

The result is that the server process stays up, but the gate executor first reports a warning, then an error, and my action never completes.

Is there some other way around this at current time?

mcdonnelldean commented 8 years ago

@boneskull Are you looking for a way to basically say never timeout?

boneskull commented 8 years ago

@mcdonnelldean Yeah, basically. I think I have solved my issue for now, however.

tribou commented 7 years ago

This may need to be a separate issue, and I'm not familiar enough with Seneca yet to know how possible this is. However, I wanted to voice that I would love to see a formally documented Seneca transport plugin API (in addition to the protocol described above). Having a straightforward guide would help users like myself create a wide variety of custom transports... Some serious (gRPC), and some just for fun (Serverless Lambda + SNS). I would be excited to see all of the possibilities the community thinks up.

StarpTech commented 7 years ago

@rjrodger the protobuf specification only allow A" … "Z" | "a" … "z" | decimalDigits | "_" for field names. The $ variables will be thrown an error. This could be an issue when I read something about protobuf.

rjrodger commented 7 years ago

@tribou this is the focus for 4.x - it will be much simpler to write transport plugins

rjrodger commented 7 years ago

@SharpTech hah!

__seneca__?

StarpTech commented 7 years ago

@rjrodger did you mean @starptech ? :D

rjrodger commented 7 years ago

:)