moleculerjs / moleculer

:rocket: Progressive microservices framework for Node.js
https://moleculer.services/
MIT License
6.14k stars 581 forks source link

Enable zero-configuration/decentralized networking like 'Cote' #46

Closed demetriusnunes closed 6 years ago

demetriusnunes commented 7 years ago

It's a pretty big win if moleculer could also work like this: http://cote.js.org/#multicast-address

The single point of failture like NATS would be gone.

icebob commented 7 years ago

Thanks, I will check it. I think it can be an other official transporter module.

ivokoko commented 7 years ago

NATs could be configured as cluster, so there won't be a single point of failure

thelinuxlich commented 7 years ago

The problem is, where will we find multicast enabled architecture? Docker for example doesn't support it.

icebob commented 6 years ago

I'm working on a TCP+UDP transporter. It uses UDP broadcast messages to detect other Moleculer nodes on the network. If it find them, it connects to the remote nodes via TCP.

tinchoz49 commented 6 years ago

maybe this library can be helpful for the service discovery https://github.com/maxogden/discovery-channel

DeividasJackus commented 6 years ago

Or perhaps https://github.com/mafintosh/discovery-swarm that is built on top of discovery-channel could be of use?

icebob commented 6 years ago

Yes, I'm checking discovery-swarm. I'm trying to create a transporter based on it.

DeividasJackus commented 6 years ago

Cote itself (as referenced in the original issue) uses https://github.com/wankdanker/node-discover btw.

icebob commented 6 years ago

I'm testing the low-level components (utp-native, peer-network) of discovery-swarm and it seems it is not too stable :confused: I.e.: if the server shutdown & restart, the clients don't find it again and throw errors.

But I will still play with them.

WoLfulus commented 6 years ago

what about nssocket ?

tinchoz49 commented 6 years ago

wow, moleculer is getting better each day, this is going to be an awesome feature. For the communication, this library can be helpful: https://github.com/nickdesaulniers/node-nanomsg

TomKaltz commented 6 years ago

+1

icebob commented 6 years ago

It is in progress. It will be a "zero configuration" TCP transporter with Gossip internode protocol (similar Cassandra gossip protocol) to detect live/offline nodes & services without every node connects to every other node. So it will able to handle 100+ nodes as well. Plus a UDP auto-discovery of course.

It will be released in the next 0.12 release in 1-2 weeks.

Branch: https://github.com/ice-services/moleculer/tree/gossip-transporter

Gossip is a peer-to-peer communication protocol in which nodes periodically exchange state information about themselves and about other nodes they know about. The gossip process runs every second and exchanges state messages with up to three other nodes in the cluster. The nodes exchange information about themselves and about the other nodes that they have gossiped about, so all nodes quickly learn about all other nodes in the cluster. A gossip message has a timestamp associated with it so that during a gossip exchange, older information is overwritten with the most current state for a particular node.

tinchoz49 commented 6 years ago

That's really awesome @icebob! Are you planning to make this transport as the default one? I think it can be a good point to start with moleculer and scale your nodes, without having to worry about running a NATS server for example.

icebob commented 6 years ago

@tinchoz49: yes. It is my plan. It will be the default transporter. So no need any transporter config, just start the nodes and everything is working :)

I'm doing performance tests. It will be the fastest transporter with ~80k req/sec on my i7. Others 10k-20k req/sec

TomKaltz commented 6 years ago

Does broadcast messaging work via gossip or does the sending node connect to all nodes to send the broadcast?

On Sat, Feb 3, 2018 at 8:16 PM, Icebob notifications@github.com wrote:

@tinchoz49 https://github.com/tinchoz49: yes. It is my plan. It will be the default transporter. So no need any transporter config, just start the nodes and everything is working :)

I'm doing performance tests. It will be the fastest transporter with ~80k req/sec on my i7. Others 10k-20k req/sec

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ice-services/moleculer/issues/46#issuecomment-362850829, or mute the thread https://github.com/notifications/unsubscribe-auth/AC0_IezSIyxlVjHZWSkBEKT_R9LvpMVQks5tRL6UgaJpZM4N4jhT .

icebob commented 6 years ago

No broadcast messages. INFO and HEARTBEAT are transferred with gossips. The REQUEST, RESPONSE, EVENT messages are sent to the target node directly.

icebob commented 6 years ago

But of course you can send broadcast events with broker.broadcast. But they are also sent directly.

icebob commented 6 years ago

Gossip protocol in action in TCP transporter: https://www.youtube.com/watch?v=9yJ_Lfw-JCs

TomKaltz commented 6 years ago

!!!!!!!!!!!

icebob commented 6 years ago

??

TomKaltz commented 6 years ago

I couldn’t contain my excitement. This is tremendous work @icebob

icebob commented 6 years ago

Ohh thanks :+1: . This transporter with gossip easily can handle 100+ nodes as well. I hope because I could test only on localhost

tinchoz49 commented 6 years ago

niceeee! At the end of my presentation, I said, "the future is decentralized and Moleculer too". Of course I was talking about this amazing feature!

icebob commented 6 years ago

:wink:

ntgussoni commented 6 years ago

Awesome work!

brad-decker commented 6 years ago

@icebob you never cease to amaze me :heart:

giang12 commented 6 years ago

image

icebob commented 6 years ago

Released in 0.12 as TCP transporter. More info

Wallacy commented 6 years ago

I did some tests in this feature... My two cents is:

udpBroadcast should be default at "255.255.255.255" because multicast enabled network is not that common, and "255.255.255.255" its a better choice for a default address to avoid manual config on every sample/example to run in some machines (same problem for multicast).

Besides that, my impression is the overall performance for this solution is not good at all (for now). The gossip protocol is interesting but have a unpredictable nature. Almost 28 seconds for 50 nodes is a eternity! I use 5-10s for connection timeout on my microservices for comparison. Compared to Cote, for exemple, i can interconnect 50 nodes in 1.2 seconds (or less).

This gossip implementation is very nice for a TCP communication protocol to enable serverless environments, but as TCP service discovery protocol needs a more direct approach i think.

icebob commented 6 years ago

With "255.255.255.255" broadcast address has problem with routers. By default, TCP transporter detects all network interfaces and send broadcasts on all interfaces and it works well. But if you need, you can use 255.255.255.255 as broadcast address in transporter config:

udpBroadcast: "255.255.255.255"

TCP transporter has the best performance at service communication thanks to the direct connection. Yes, the first discovery time can be big if you have many nodes. But my question, which is more likely?

  1. you start & stop 50 nodes in every minutes and you have to wait 28 seconds.
  2. you start 50 nodes once a month and your services communicate continuously & quickly.

If you belong to the first group, I suggest that you should use Cote, or any other transporter (NATS, Redis...etc)

Wallacy commented 6 years ago

By default, TCP transporter detects all network interfaces and send broadcasts on all interfaces and it works well.

That not work on my machine. Not even a simple client-server exemple (Multicast at 192.168.1.123:4445. Membership: 239.0.0.0). And if i only set udpBroadcast to true, also does not work. I need to explicit say 192.168.1.255 or other interface.

*EDIT: The problem of the default Multicast and default Broadcast config was solved after update the node.js.

...Yes, the first discovery time can be big if you have many nodes...

I don't think you understand the problem here... If i only start one single node, not 50 or 2, only 1, maybe a need to wait the full 28 seconds to ensure that all others nodes are seeing my service. And also, 50 nodes is a small number some times, 1000 is a good target. But the real problem is when only 1 node is started on a running pool with several others.

One of best part of micro-service architecture is the reaction time for problems and changes. And, my criticism of this approach is because does appear to scale well. Also, i just cited Cote because also uses TCP for communication. And i think the idea is make what is the best for the project in a long run, not just for small projects.

EDIT2: I will make more tests and see what i can do to improve this situation. Thanks