ZeroMQ/ZMTP/Malamute vs Reactive Streams

benjchristensen commented 9 years ago

Anyone having experience with ZeroMQ can you provide insights into the work being done on Malamute (https://github.com/zeromq/malamute/blob/master/MALAMUTE.md)? Or perhaps ZMTP (http://zmtp.org)?

Malamute uses credit based flow control (CBFC) to manage the buffering of data from broker to client.

It seems to have overlapping goals and approaches so I'd like to have a good understanding of it both to learn and if we proceed with RS.io why we need a new protocol as opposed to using ZeroMQ.

viktorklang commented 9 years ago

I have terrible experiences with 0mq unfortunately. :(

At least the 2.x series were horrible from a design standpoint.

I'd probably look into using Aeron.

tnn commented 9 years ago

@viktorklang Was it the the protocol (zmtp), the C/Java bindings or the JeroMQ (Java native) implementations of that protocol you had bad experiences with? Would you care to specify what was wrong in your point of view?

I was wondering, if it is a good idea to invent yet another protocol. The messaging space is fairly fragmented already, where a user is faced with the choice of fx JMS, AMQP, Redis, STOMP, Kafka, ZMTP and now Aeron. From what I gather, the implementation is less important, compared to the protocol and how it's defined. As a consumer of a given message product, having one common wire protocol and multiple vendors / implementations minimises my adoption risk, given I can pick an alternative solution without writing my code.

Can't we just pick one protocol so we can actually build stuff that compose? :)

viktorklang commented 9 years ago

@tnn Multiple things,

the Java bindings were not backwards compatible so I had to end up having to generate my own bindings with JNA.

The C implementation had really weird setup w.r.t. threading and assumed that a Socket had to be called from the same Thread as created it.

if there were problems the C implementation would just execute abort() taking the entire JVM down without as much as a stack trace.

The jeromq implementation was crippled compared to the native implementation (I think that's still true).

While agreeing on a wire protocol would be interesting and useful, I'd have reservations w.r.t performance (latency, throughput and general wire overhead) as well as the existence of automated compliance verification, of which none exists for 0MQ AFAIK.

Since a certain degree of reliability is required, and @tmontgomery is probably well tired to handroll that more times, I'd posit that using UDP as the level is a no-go, which pushes us into Aeron space.

(Anything TCP based would be no-go from my PoV since it is not at all designed for messaging).

I'd love to hear more opinions on this!

danarmak commented 9 years ago

@viktorklang TCP has two advantages:

Its ability to reliably communicate between any two public IPs on the Internet without worrying about some network equipment in the middle blocking your custom IP-based protocol or forgetting your UDP session NAT mapping.
Implemented in every language and environment.

I'm not personally familiar with Aeron. From skimming its website, it might be able to run on top of TCP too (but then why use it?) However, it only has implementations for Java and C++, which is a really big issue. And it seems less well known and less tested than 0mq.

Some of the usecases people are talking about require support for various langauges, and for Javascript in browsers in particular - the easiest way of doing the latter is Websockets. (Some people also want HTTP/2.) If so, we should support multiple duplex-stream-like transports, because there's no reason for anyone outside a browser to use Websockets.

In this scenario we could also specify support for Aeron or other advanced transports, but I think we should start with the simple TCP-like core to get a first spec & implementation rolling, and add more later.

tnn commented 9 years ago

Viktor, thanks for taking the time to clarify.

The issues you mention seems pretty grave and I understand your reasoning. Spotify did actually roll their own ZMTP implementation on top of Netty. From reading the ZMTP specification, it just seems like the there is no good justifications for not picking it as the underlying protocol. Personally at least, I have a hard time to come up with areas I would have done different for performance reasons or simplicity.

Your remark about test compatibility kit is spot on and I definitely agree that ZMTP/ZeroMQ is lacking this.

Please forgive me for jumping into into discussions about implementation, the requirements for the protocol should of course be settled first and foremost.

CC: @hintjens @sustrik - This topic might be of interest to you.

viktorklang commented 9 years ago

@danarmak

@viktorklang TCP has two advantages:

Its ability to reliably communicate between any two public IPs on the Internet without worrying about >some network equipment in the middle blocking your custom IP-based protocol or forgetting your UDP >session NAT mapping.

I contest the first claim by stressing the fact that there's no reliability across TCP connections so if you have a RST coming in you're basically forced to terminate the transmission.

Implemented in every language and environment.

That is a very good point. But being widespread doesn't help if it doesn't live up to the other requirements. At least not IMO.

I'm not personally familiar with Aeron. From skimming its website, it might be able to run on top of TCP >too (but then why use it?) However, it only has implementations for Java and C++, which is a really big >issue. And it seems less well known and less tested than 0mq.

That is a good point. Perhaps we ought to start out by speccing what platforms are required to support?

Some of the usecases people are talking about require support for various langauges, and for >Javascript in browsers in particular - the easiest way of doing the latter is Websockets. (Some people >also want HTTP/2.) If so, we should support multiple duplex-stream-like transports, because there's no >reason for anyone outside a browser to use Websockets.

Perhaps we ought to split out requirements into different parts, transport-level and RS protocol level?

In this scenario we could also specify support for Aeron or other advanced transports, but I think we >should start with the simple TCP-like core to get a first spec & implementation rolling, and add more later.

What does TCP-like core mean in this context? I'd want to avoid designing something around a protocol not designed for the problem at hand for the simple reason of risking to paint oneself into a corner. But if you mean that we ought to design the protocol driven by the requirements and divorced from the actual transport, and then try to shim it on top of TCP first, then I could sympathize with that.

Let me know if I misinterpreted you in any way.

viktorklang commented 9 years ago

@tnn

Viktor, thanks for taking the time to clarify.

No problem at all, thanks for asking for elaboration!

The issues you mention seems pretty grave and I understand your reasoning. Spotify did actually roll >their own ZMTP implementation on top of Netty. From reading the ZMTP specification, it just seems like >the there is no good justifications for not picking it as the underlying protocol. Personally at least, I have >a hard time to come up with areas I would have done different for performance reasons or simplicity.

My issue with choosing 0mq for this is that it provides much more than we need (IMO) while also locking us into a specific technology. Also, is it available for all platforms we need to support? Browsers?

Your remark about test compatibility kit is spot on and I definitely agree that ZMQP/ZeroMQ is lacking >this.

Exactly! Designing a protocol without a TCK seems dubious to me after being involved with reactive-streams-jvm.

danarmak commented 9 years ago

@viktorklang

I contest the first claim by stressing the fact that there's no reliability across TCP connections so if you have a RST coming in you're basically forced to terminate the transmission.

This is true, but it's not what I was talking about. Non-TCP-based communications all have one problem: you can't trust that two public IPs on the Internet will be able to communicate. You have to take network middleware behavior into account, including components you have no control over, and in practice there are often problems and there are environments which you just can't support.

In TCP the server (i.e. the side that does 'listen' as oppose to 'connect') can always send packets back to the client. In UDP or plain IP, traffic that traverses a stateful firewall or NAT gateway relies on its implicit connection mapping, and if you don't send any packets for a while (or in other unspecified implementations-dependent conditions) the mapping will be dropped and the 'server' won't be able to send data to the 'client'. And some networks don't support UDP/etc stateful mapping at all (e.g. low-grade NAT solutions people get at home from their ISPs, and maybe mobile networks and such).

Some network equipment even blocks traffic that has an unrecognized or unsupported IP protocol (not TCP, UDP or ICMP).

I contest the first claim by stressing the fact that there's no reliability across TCP connections so if you have a RST coming in you're basically forced to terminate the transmission.

Requirements come from use cases. I think we should explicitly specify use cases (#3) and the requirements will fall out.

Perhaps we ought to split out requirements into different parts, transport-level and RS protocol level?

The question then is what features to require from the transport, what features to optionally implement ourselves if the transport doesn't provide them, and how far the specification can stay generic without devolving into incompatible variants for different transports.

What does TCP-like core mean in this context?

I meant a bidirectional, fully duplex, byte stream with exactly-once in-order delivery.

But if you mean that we ought to design the protocol driven by the requirements and divorced from the actual transport, and then try to shim it on top of TCP first, then I could sympathize with that.

If TCP doesn't fit, then websockets and HTTP/2 also don't fit, and those are use cases some people explicitly want. So we should settle the core use cases first - maybe they will end up all but mentioning TCP by name.

viktorklang commented 9 years ago

@viktorklang

I contest the first claim by stressing the fact that there's no reliability across TCP connections so if you >>have a RST coming in you're basically forced to terminate the transmission.

This is true, but it's not what I was talking about. Non-TCP-based communications all have one >problem: you can't trust that two public IPs on the Internet will be able to communicate. You have to >take network middleware behavior into account, including components you have no control over, and in >practice there are often problems and there are environments which you just can't support.

You can't trust that 2 public IPs on the Internet will be able to communicate at all as a general rule though?

In TCP the server (i.e. the side that does 'listen' as oppose to 'connect') can always send packets back >to the client.

Sending packets can always be done, they might get discarded though.

In UDP or plain IP, traffic that traverses a stateful firewall or NAT gateway relies on its implicit >connection mapping, and if you don't send any packets for a while (or in other unspecified >implementations-dependent conditions) the mapping will be dropped and the 'server' won't be able to >send data to the 'client'. And some networks don't support UDP/etc stateful mapping at all (e.g. low->grade NAT solutions people get at home from their ISPs, and maybe mobile networks and such).

@tmontgomery will correct me if I'm wrong here, but all connections will be dropped at some point if there is no traffic.

Some network equipment even blocks traffic that has an unrecognized or unsupported IP protocol (not >TCP, UDP or ICMP).

Sure, but how is this related?

I contest the first claim by stressing the fact that there's no reliability across TCP connections so if you >>have a RST coming in you're basically forced to terminate the transmission. Requirements come from use cases. I think we should explicitly specify use cases (#3) and the >requirements will fall out.

Completely agree with that!

Perhaps we ought to split out requirements into different parts, transport-level and RS protocol level? The question then is what features to require from the transport, what features to optionally implement >ourselves if the transport doesn't provide them, and how far the specification can stay generic without >devolving into incompatible variants for different transports.

Exactly.

What does TCP-like core mean in this context? I meant a bidirectional, fully duplex, byte stream with exactly-once in-order delivery.

And I'm arguing that that's not only not desirable, but also slightly incorrect. (TCP is not exactly-once)

But if you mean that we ought to design the protocol driven by the requirements and divorced from the >>actual transport, and then try to shim it on top of TCP first, then I could sympathize with that. If TCP doesn't fit, then websockets and HTTP/2 also don't fit, and those are use cases some people >explicitly want. So we should settle the core use cases first - maybe they will end up all but mentioning >TCP by name.

I'm sure it'll be possible to tunnel over TCP, what I'm saying is that we shouldn't design for TCP as it provides features and guarantees that is not worth paying for from a performance standpoint, case in point is Aeron.

So let's start working on requirements, I think the RS interfaces very nicely lends themselves to be encoded as a network protocol.

danarmak commented 9 years ago

You can't trust that 2 public IPs on the Internet will be able to communicate at all as a general rule though? Sending packets can always be done, they might get discarded though. all connections will be dropped at some point if there is no traffic.

My wording was wrong again, sorry.

Most computers today are behind at least one NAT gateway. TCP is often used in client-server setups where the client doesn't have a public IP, or doesn't know what it is. In theory there's nothing special about TCP, NAT gateways need to maintain connection state for all IP protocols. But I get the impression most network equipment tends to drop UDP connection state much quicker than TCP. And some network middleware may not do UDP connection state tracking at all.

For this reason, protocols that are neither TCP nor UDP based simply can't be used with home ISP grade connections. Sometimes firewalls are configured to filter them out as well. I think this impeded the adoption of SCTP a lot, see e.g. the answers here. Imagine if every new protocol from the last 10 years had all the SCTP features for free! This story is the kind of problem I'm most afraid of when going with a non-TCP and especially non-UDP based transport.

I meant a bidirectional, fully duplex, byte stream with exactly-once in-order delivery. And I'm arguing that that's not only not desirable, but also slightly incorrect. (TCP is not exactly-once)

Do you mean that TCP is not exactly-once because when a connection is interrupted, each side can't know exactly which packets the other one has received but not acknowledged yet?

The reason I want exactly-once delivery (except when failing anyway) is that it makes demand calculations much easier. Publisher and subscriber must always agree precisely on the current value of outstanding demand on an RS stream. Of course this can be implemented on the application/protocol level, there's no reason the transport has to provide this guarantee, but why does it hurt if it does provide it? Efficiency reasons?

I'm sure it'll be possible to tunnel over TCP, what I'm saying is that we shouldn't design for TCP as it provides features and guarantees that is not worth paying for from a performance standpoint, case in point is Aeron.

I don't disagree in principle, but it does make the task much harder, so let's identify the usecases that need this.

hintjens commented 9 years ago

I'm happy to chime in to this thread if there are specific questions I can answer. However vague discussions comparing apple seeds to mangoes can go on forever. Malamute is not TCP is not ZMTP. To compare these is... suboptimal.

BTW JeroMQ is excellent, and as fast as the JNI wrapped libzmq. It's drawbacks are lack of encryption and lack of PGM. Otherwise, plug-in compatible.

viktorklang commented 9 years ago

Most computers today are behind at least one NAT gateway. TCP is often used in client-server setups >where the client doesn't have a public IP, or doesn't know what it is. In theory there's nothing special >about TCP, NAT gateways need to maintain connection state for all IP protocols. But I get the >impression most network equipment tends to drop UDP connection state much quicker than TCP. And >some network middleware may not do UDP connection state tracking at all.

TBH I think UDP is on the rise (IoT, QUIC etc). Also, keep in mind that most over-the-internet games use UDP.

But lets stop guessing and start evaluating :)

For this reason, protocols that are neither TCP nor UDP based simply can't be used with home ISP >grade connections. Sometimes firewalls are configured to filter them out as well. I think this impeded the >adoption of SCTP a lot, see e.g. the answers here. Imagine if every new protocol from the last 10 years >had all the SCTP features for free! This story is the kind of problem I'm most afraid of when going with a >non-TCP and especially non-UDP based transport.

Absolutely. I think UDP is most likely to work out in practice. Remember that TCP does HOL blocking which creates all kinds of headache for muxing.

Do you mean that TCP is not exactly-once because when a connection is interrupted, each side can't >know exactly which packets the other one has received but not acknowledged yet?

I know what hasn't been acked, but not if it was received. This means that reconnecting involved having to handroll idempotent receives (at-least-once), so you pay for deduplication and acking at the protocol level, but hten you must pay again to deal with disconnects, which we all know to happen very frequently for mobile devices.

The reason I want exactly-once delivery (except when failing anyway) is that it makes demand >calculations much easier. Publisher and subscriber must always agree precisely on the current value of >outstanding demand on an RS stream. Of course this can be implemented on the application/protocol >level, there's no reason the transport has to provide this guarantee, but why does it hurt if it does >provide it? Efficiency reasons?

But that is fairly simple to implement since demand is additive the backchannel can always aggregate locally until getting an ack for previous demand request.

I don't disagree in principle, but it does make the task much harder, so let's identify the usecases that need this.

+1!

benjchristensen commented 9 years ago

Related to this discussion is the draft proposal for QUIC: https://tools.ietf.org/html/draft-tsvwg-quic-protocol-00

viktorklang commented 9 years ago

@benjchristensen Interesting!

benjchristensen commented 9 years ago

FYI that @tmontgomery and I have been working on a protocol that uses Reactive Streams semantics.

cc others involved in the design and development: @stevegury @robertroeser @niteshkant

viktorklang commented 9 years ago

Thanks for the update, @benjchristensen!

I've been swamped lately, I'm looking forward to us (Typesafe) being involved. Ping @rkuhn, @ktoso, @drewhk

reactive-streams / reactive-streams-io

ZeroMQ/ZMTP/Malamute vs Reactive Streams #8