zeromq / zeromq.js

:zap: Node.js bindings to the ØMQ library
http://zeromq.github.io/zeromq.js/
MIT License
1.48k stars 212 forks source link

My work on a next-generation proof of concept for ZMQ bindings for Node.js #189

Closed rolftimmermans closed 4 years ago

rolftimmermans commented 7 years ago

TL;DR: I'm this issue to start a discussion about the future of zeromq.js and my work on a proof of concept for a next-generation version. Please let me know your opinion!

Background

We have recently taken a system into production that uses zeromq.js. This works well, but I encountered a number of issues with this library that triggered me to think about solutions. Unfortunately some problems did not appear to be fixable without a massive overhaul of the internals. I will briefly describe the main problems.

Problem 1: Sent messages are queued, but not with ZMQ

Zeromq.js uses an internal queuing mechanism for sent messages. Essentially it pushes all messages onto a JavaScript array and sends them out as soon as the socket is ready for writing. This works nicely except when it doesn't:

Related issues: #152 #185 #124

From https://github.com/zeromq/zeromq.js/issues/152#issuecomment-326894190 I understand that this queueing was built to improve performance. Which means that matching the performance of the current library should be a goal of any new version.

Problem 2: Messages that are received are read automatically

Readable sockets follow the event emitter pattern. This is common in Node.js APIs, so it makes sense for most people. There is important one drawback, though: messages are read automatically.

It appears that a call to pause()/resume() offers a suitable workaround, however:

Related issues: #139

Other potential improvements

There were some other things we encountered while working on the library that we believed could be improved:

My work on a proof of concept

Given that nature of problems 1 & 2 I set out to rewrite the internals and design a better API that is suitable for honouring high water marks and timeouts for both send & receive operations.

Because I was now effectively rewriting the entire library I also decided to address the additional improvements mentioned above. Especially the move to N-API is non-trivial and influences most of the native code.

To summarize, the goals of my experiment were:

What is included and working

What is still missing

The following features have not yet been added. My plan is to implement them over the next ~weeks~ years:

So, how do I use it?

~To build the proof of concept version you need a local installation of libzmq with development headers. For example, on Debian/Ubunto you can apt-get isntall libzmq3-dev, or on macOS you can brew install zeromq.~

~Then clone the repository and run the tests with yarn test or npm test.~

Install with npm install zeromq-ng.

Next, have a look at the the examples in the README. To give you a taste of the API, this is how you can use it:

const zmq = require("zeromq-ng")

const push = new zmq.Push
const pull = new zmq.Pull

async function produce() {
  while (!push.closed) {
    /* Queues a message and resolves the promise on success.

       This is equivalent to zmq_msg_send() in the sense that it can "block",
       but it does not actually block the internal Node.js event queue.

       If the message cannot be queued or times out the promise is rejected. */

    await push.send("some work")
    await new Promise(resolve => setTimeout(resolve, 500))
  }
}

async function consume() {
  while (!push.closed) {
    /* Waits for a new message to be received. Any timeout will cause the
       promise to be rejected. */

    const [msg] = await pull.receive()
    console.log("work: %s", msg.toString())
  }

  /* An alternative to this loop is to use async iterators:

     for await (const [msg] of pull) {
       console.log("work: %s", msg.toString())
     }
  */
}

async function run() {
  await push.bind("tcp://127.0.0.1:3000")
  pull.connect("tcp://127.0.0.1:3000")

  await Promise.all([produce(), consume()])
}

run().catch(console.error)

I've you made it this far reading this post, thank you for your attention!

What next?

I'd like to invite you to have a look at https://github.com/rolftimmermans/zeromq-ng and give your feedback.

Specifically, I'd like to discuss two scenarios:

  1. I publish this as a new library that exists separately from zeromq.js
  2. This can be the beginning of a next major (API incompatible) release of zeromq.js

I'd be completely happy with either.

In addition to the above I am looking from feedback from users that have experience with one or more of the problems mentioned above (or even other problems!). It would be nice if you can give your opinion on the API or even test the proof of concept.

I am also in the process of writing and collecting benchmarks and will post the results soon. Based on what I have tested so far I'm expecting similar performance in general.

rgbkrk commented 7 years ago

This is super cool. I'm generally pretty happy with taking this direction. I'll give this new API a spin with our current code to see how it behaves and see how easy it is to adapt.

The way I'd end up wrapping this (like we do now in nteract) is to turn these async iterators into observables which I don't think is too bad. I can see some areas where I'd prefer the error flow I have with observables (noting the timeout causing promise rejection), though I think I see how I'd handle these generally. All that being said, async iterators are stage 3 (further along), Observables stage 1 (and ready for next stage). I guess that's a yay for async iterators from me I suppose.

I'm interested to hear what @kkoopa thinks as someone so heavily involved in this library and in node.js addons.

ronkorving commented 7 years ago

Hi @rolftimmermans, good stuff! Please steal what you need from https://github.com/JustinTulloss/zeromq.node/pull/516 which is also a full rewrite. I think there are a few decent ideas in there.

kkoopa commented 7 years ago

Looks good. This library has been in need of an overhaul.

rolftimmermans commented 7 years ago

An update: I have added API documentation, support for socket events and prebuilt binaries for Node.js 8.6 on Linux/macOS. It is available as zeromq-ng. Looking forward to your feedback!

lgeiger commented 7 years ago

@rolftimmermans Wow thanks for digging into this. I'm excited to take a look soon.

rgbkrk commented 7 years ago

I think for us at least we'll need to figure out how well N-API works with Electron, so we're going to have to link up our current setup with the new bindings. Thanks for publishing on zeromq-ng.

rolftimmermans commented 7 years ago

Electron compatibility depends on one of the two following happening:

rolftimmermans commented 7 years ago

I have some initial benchmarks comparing zeromq-ng vs zeromq.js. There are three types of benchmarks, all are executed with both zeromq-ng (zmq=ng) and zeromq.js (zmq=cur) various message sizes (msgsize=...), and tcp/inproc transport (proto=...):

All benchmarks can be found here.

The benchmarks were executed on my Macbook. It would be nice to get some Linux benchmarks too!

queue proto=tcp msgsize=1 zmq=cur x 51.79 ops/sec ±1.43% (56 runs sampled) queue proto=tcp msgsize=1 zmq=ng x 55.11 ops/sec ±1.88% (64 runs sampled) queue proto=tcp msgsize=16 zmq=cur x 53.26 ops/sec ±1.24% (63 runs sampled) queue proto=tcp msgsize=16 zmq=ng x 54.54 ops/sec ±1.00% (64 runs sampled) queue proto=tcp msgsize=256 zmq=cur x 47.60 ops/sec ±1.29% (61 runs sampled) queue proto=tcp msgsize=256 zmq=ng x 39.29 ops/sec ±1.17% (57 runs sampled) queue proto=tcp msgsize=4096 zmq=cur x 46.38 ops/sec ±1.93% (60 runs sampled) queue proto=tcp msgsize=4096 zmq=ng x 38.45 ops/sec ±0.88% (56 runs sampled) queue proto=tcp msgsize=65536 zmq=cur x 22.20 ops/sec ±2.84% (54 runs sampled) queue proto=tcp msgsize=65536 zmq=ng x 19.13 ops/sec ±1.94% (50 runs sampled) queue proto=tcp msgsize=1048576 zmq=cur x 7.74 ops/sec ±0.91% (46 runs sampled) queue proto=tcp msgsize=1048576 zmq=ng x 7.23 ops/sec ±1.30% (45 runs sampled) queue proto=inproc msgsize=1 zmq=cur x 30.62 ops/sec ±6.90% (48 runs sampled) queue proto=inproc msgsize=1 zmq=ng x 20.05 ops/sec ±2.86% (51 runs sampled) queue proto=inproc msgsize=16 zmq=cur x 17.53 ops/sec ±2.87% (47 runs sampled) queue proto=inproc msgsize=16 zmq=ng x 15.28 ops/sec ±1.45% (60 runs sampled) queue proto=inproc msgsize=256 zmq=cur x 12.99 ops/sec ±3.18% (56 runs sampled) queue proto=inproc msgsize=256 zmq=ng x 10.66 ops/sec ±2.71% (52 runs sampled) queue proto=inproc msgsize=4096 zmq=cur x 9.57 ops/sec ±1.75% (50 runs sampled) queue proto=inproc msgsize=4096 zmq=ng x 7.73 ops/sec ±2.46% (46 runs sampled) queue proto=inproc msgsize=65536 zmq=cur x 3.63 ops/sec ±5.38% (34 runs sampled) queue proto=inproc msgsize=65536 zmq=ng x 2.97 ops/sec ±4.68% (32 runs sampled) queue proto=inproc msgsize=1048576 zmq=cur x 3.00 ops/sec ±5.86% (31 runs sampled) queue proto=inproc msgsize=1048576 zmq=ng x 3.09 ops/sec ±2.05% (33 runs sampled) deliver proto=tcp msgsize=1 zmq=cur x 3.14 ops/sec ±3.76% (33 runs sampled) deliver proto=tcp msgsize=1 zmq=ng x 2.80 ops/sec ±4.53% (32 runs sampled) deliver proto=tcp msgsize=16 zmq=cur x 2.92 ops/sec ±5.53% (31 runs sampled) deliver proto=tcp msgsize=16 zmq=ng x 2.49 ops/sec ±6.40% (31 runs sampled) deliver proto=tcp msgsize=256 zmq=cur x 3.24 ops/sec ±1.94% (33 runs sampled) deliver proto=tcp msgsize=256 zmq=ng x 2.93 ops/sec ±1.81% (32 runs sampled) deliver proto=tcp msgsize=4096 zmq=cur x 3.16 ops/sec ±2.68% (32 runs sampled) deliver proto=tcp msgsize=4096 zmq=ng x 2.73 ops/sec ±3.67% (31 runs sampled) deliver proto=tcp msgsize=65536 zmq=cur x 1.80 ops/sec ±2.47% (28 runs sampled) deliver proto=tcp msgsize=65536 zmq=ng x 1.56 ops/sec ±3.77% (27 runs sampled) deliver proto=tcp msgsize=1048576 zmq=cur x 0.43 ops/sec ±5.24% (22 runs sampled) deliver proto=tcp msgsize=1048576 zmq=ng x 0.48 ops/sec ±5.65% (22 runs sampled) deliver proto=inproc msgsize=1 zmq=cur x 3.15 ops/sec ±1.73% (33 runs sampled) deliver proto=inproc msgsize=1 zmq=ng x 3.57 ops/sec ±4.28% (34 runs sampled) deliver proto=inproc msgsize=16 zmq=cur x 3.15 ops/sec ±1.53% (33 runs sampled) deliver proto=inproc msgsize=16 zmq=ng x 3.80 ops/sec ±1.32% (35 runs sampled) deliver proto=inproc msgsize=256 zmq=cur x 2.81 ops/sec ±6.51% (30 runs sampled) deliver proto=inproc msgsize=256 zmq=ng x 3.19 ops/sec ±6.19% (33 runs sampled) deliver proto=inproc msgsize=4096 zmq=cur x 3.12 ops/sec ±2.00% (33 runs sampled) deliver proto=inproc msgsize=4096 zmq=ng x 3.67 ops/sec ±2.82% (34 runs sampled) deliver proto=inproc msgsize=65536 zmq=cur x 2.23 ops/sec ±3.17% (30 runs sampled) deliver proto=inproc msgsize=65536 zmq=ng x 2.48 ops/sec ±1.72% (30 runs sampled) deliver proto=inproc msgsize=1048576 zmq=cur x 1.63 ops/sec ±4.84% (27 runs sampled) deliver proto=inproc msgsize=1048576 zmq=ng x 2.13 ops/sec ±8.45% (29 runs sampled) deliver multipart proto=tcp msgsize=1 zmq=cur x 2.94 ops/sec ±2.59% (32 runs sampled) deliver multipart proto=tcp msgsize=1 zmq=ng x 2.86 ops/sec ±4.72% (33 runs sampled) deliver multipart proto=tcp msgsize=16 zmq=cur x 3.00 ops/sec ±2.43% (33 runs sampled) deliver multipart proto=tcp msgsize=16 zmq=ng x 2.65 ops/sec ±4.05% (32 runs sampled) deliver multipart proto=tcp msgsize=256 zmq=cur x 2.99 ops/sec ±3.40% (33 runs sampled) deliver multipart proto=tcp msgsize=256 zmq=ng x 3.04 ops/sec ±3.42% (33 runs sampled) deliver multipart proto=tcp msgsize=4096 zmq=cur x 2.67 ops/sec ±5.66% (31 runs sampled) deliver multipart proto=tcp msgsize=4096 zmq=ng x 1.94 ops/sec ±6.08% (27 runs sampled) deliver multipart proto=tcp msgsize=65536 zmq=cur x 1.18 ops/sec ±6.29% (25 runs sampled) deliver multipart proto=tcp msgsize=65536 zmq=ng x 1.32 ops/sec ±6.06% (26 runs sampled) deliver multipart proto=tcp msgsize=1048576 zmq=cur x 0.16 ops/sec ±4.03% (20 runs sampled) deliver multipart proto=tcp msgsize=1048576 zmq=ng x 0.19 ops/sec ±1.03% (20 runs sampled) deliver multipart proto=inproc msgsize=1 zmq=cur x 2.90 ops/sec ±2.65% (32 runs sampled) deliver multipart proto=inproc msgsize=1 zmq=ng x 3.64 ops/sec ±1.20% (34 runs sampled) deliver multipart proto=inproc msgsize=16 zmq=cur x 2.92 ops/sec ±2.05% (32 runs sampled) deliver multipart proto=inproc msgsize=16 zmq=ng x 3.68 ops/sec ±2.37% (35 runs sampled) deliver multipart proto=inproc msgsize=256 zmq=cur x 3.01 ops/sec ±1.45% (32 runs sampled) deliver multipart proto=inproc msgsize=256 zmq=ng x 3.42 ops/sec ±1.10% (34 runs sampled) deliver multipart proto=inproc msgsize=4096 zmq=cur x 2.82 ops/sec ±3.97% (31 runs sampled) deliver multipart proto=inproc msgsize=4096 zmq=ng x 3.42 ops/sec ±1.16% (34 runs sampled) deliver multipart proto=inproc msgsize=65536 zmq=cur x 1.57 ops/sec ±4.97% (27 runs sampled) deliver multipart proto=inproc msgsize=65536 zmq=ng x 1.57 ops/sec ±4.48% (27 runs sampled) deliver multipart proto=inproc msgsize=1048576 zmq=cur x 1.05 ops/sec ±1.93% (24 runs sampled) deliver multipart proto=inproc msgsize=1048576 zmq=ng x 1.23 ops/sec ±1.52% (25 runs sampled)

Some casual observations:

Overall I find the differences pretty small. It would be nice to have some "real world" benchmarks. Can anyone suggest a good use case to test?

rgbkrk commented 6 years ago

How much of the old API (the "typical" API that is) can we keep around, marking for deprecation, even if it wraps the new implementation?

jez9999 commented 6 years ago

The only issue I have is, why does this have to be in yet another npm namespace? We had zmq, I don't know why but you guys moved it to zeromq, and now you're moving it again to zeromq-ng? Can't we just keep one namespace? It's so much easier to upgrade when the namespace doesn't change.

rgbkrk commented 6 years ago

We're not moving it to zeromq-ng. @rolftimmermans wanted to publish a package that made it easy to try a completely different interface, without breaking current usage.

rolftimmermans commented 6 years ago

As mentioned I much prefer to keep everything in zeromq.js where possible, and zeromq-ng is just a way to publish my work so far to get feedback.

@rgbkrk I like your idea and will try to create a wrapper that behaves like the current API. It would sacrifice all the semantic improvements, but should make it easier to use both APIs side by side.

rolftimmermans commented 6 years ago

I have added a compatibility layer that covers 95% of the existing API. Missing (and currently impossible to add with the new API) are: (un)bindSync, and read().

kkoopa commented 6 years ago

That is great. Should hopefully make the transition much smoother.

JamesRamm commented 6 years ago

Hi there looks like some great improvements. Dumb question - is work still ongoing on this? A quick glance at the repository(ies) seems to suggest it has stalled...

rolftimmermans commented 6 years ago

As far as I'm concerned – absolutely! More real world usage means more feedback and more fixes.

rgbkrk commented 6 years ago

Here's the direction I'd like to go:

How's that sound @rolftimmermans?

rolftimmermans commented 6 years ago

That sounds absolutely excellent. I will continue work on zeromq-ng; there are two important things to resolve before we can expose the compat version as zeromq. I'm confident we should be able to resolve them over the next weeks/months or so;

  1. https://github.com/rolftimmermans/zeromq-ng/issues/32
  2. https://github.com/rolftimmermans/zeromq-ng/issues/29

For the last one I'm dependent on changes to node-pre-gyp (or prebuild – same issues apply there).

I also wonder what we should do about Electron builds; I am not quite aware what the N-API story is there.

I'll gather all details over the next week and open a new issue to discuss the practicalities of how this can be accomplished.

Any feedback regarding the NG API is more than welcome in the mean time.

ronkorving commented 6 years ago

This is really great stuff. @rolftimmermans, have you been able to do this without copying the buffers you send? In the old implementation, every buffer gets duplicated, which really ruins part of what makes 0MQ fast in the first place.

lgeiger commented 6 years ago

@rolftimmermans It's been quite some time since you started this effort and it looks like zeromq-ng is coming along really great. 🎉 Whats your take on the current status of zeromq-ng?

I would be 👍for moving your efforts into this repo and start making a few beta releases to get more feedback from the community. @ronkorving @reqshark @rgbkrk @kkoopa What do you think?

rgbkrk commented 6 years ago

I see the tests are written in typescript now @rolftimmermans, would you consider doing the base library in typescript too? 🤤

rolftimmermans commented 6 years ago

I think it's sort of ready for prime time.

rolftimmermans commented 6 years ago

I see the tests are written in typescript now @rolftimmermans, would you consider doing the base library in typescript too? 🤤

Most of it is C++ so that won't give us any TypeScript bindings. I have hand written TypeScript definitions published as part of the library and the test suite itself uses them (and therefore they should be tested, too).

Potentially the getters and setters for sockets and context (currently implemented with meta-programming) can be written in TypeScript by using code generation. That is a fine approach, but it requires some more effort.

I'm not sure how easy it is to combine an automatically generated .d.ts file (for getters/setters) with a hand-written .d.ts file (for native C++ code) into a single class definition. Has anyone ever done this before?

lgeiger commented 6 years ago

I think it's sort of ready for prime time.

That's awesome! 🎉 🚀

  • I am not sure what the story for Electron is. With N-API it should work, I guess? I don't develop for Electron and I have no idea how to check. Might be worth investigating by someone?

I tried it out at https://github.com/nteract/nteract/pull/3388 and it worked great with Electron 3 so far. There are a few annoying issues related to the publishing process, since node-pre-gyp, N-API and electron-builder don't play very well together. This pain points would likely be solved by https://github.com/rolftimmermans/zeromq-ng/pull/46. But no real blockers, on the zeromq-ng side.

  • Node 10 is soon becoming LTS and Node 6 is EOL in April Next year. Because of N-API zeromq-ng requires at least 8+. So dropping Node 6 support would not be too horrible I guess? There's always the 4.x branch if someone needs it.

I think technically one might even be able to use N-API together with Node 6, but I think it's totally reasonable drop support for Node 6. Personally I'd even be OK with only supporting Node 10+ 😇

  • There is probably some approach to be discussed how to move the project. Should we create a new branch for it, move repositories, ...? Not sure what the best approach is here.

If nobody raises objections to zeromq-ng being the future of zeromq.js, I'd propose the following next steps:

rgbkrk commented 6 years ago

I'm cool with that plan.

ronkorving commented 6 years ago

Great work, looks very clean. I would be very worried about the segfault, though. You'll probably want to analyze a core dump. Maybe this group can help with the analysis?

rgbkrk commented 5 years ago

Extra note: N-API is now stable as of Node 10 which is the most recent LTS. I think it's time for us to bring this in.

@rolftimmermans can you take on making the PR to master with zeromq-ng?

NOTE: I added an extra item to the plan @lgeiger outlined since we'll also want to deprecate the typings on DefinitelyTyped.

rgbkrk commented 5 years ago

@crazybits please post to stack overflow or raise a new issue, as your support needs are unrelated to this issue.

rvion commented 5 years ago

is this still planed soon ?

rgbkrk commented 5 years ago

Haven't heard back from @rolftimmermans in a bit and the rest of us have been busy with other projects (or in my case a baby), so I don't think you'll be seeing it soon.

rolftimmermans commented 5 years ago

Main blocker are a bunch of segfaults with Node 10. I haven't had the chance to work on it as much as I'd like – feel free to audit the code base and try it out for yourself! Contributions very much welcome!

Once the Node 10 issues are resolved I'd say it's ready for production (we're using it in production with Node 8 successfully so far).

ralphtheninja commented 5 years ago

:+1: For implementing bindings using n-api. It would help a lot with maintenance and also with support for electron without needing to rebuild the bindings.

rolftimmermans commented 5 years ago

It's almost 2020 and I have come to the conclusion that all problems seem to vanish when compiling against Node 10. I have ripped out all compatibility layers dealing with earlier, experimental N-API versions in Node 8.7+. I have tested and verified that these binaries now work correctly with Node 10, 12 and 13 RC 1 without issues.

Since Node 8 is nearing the end of its lifetime I am ready to declare the implementation of ZeroMQ-NG stable. We have deployed the latest beta to production without issues so far (on Alpine Linux).

Additionally I have been able to create all prebuilds with just Travis, ditching AppVeyor, also build libzmq from source on Windows. Prebuilds are now included in the published package with prebuildify, so there are no download steps if a build exists for the user's system.

I have some ideas for improvements and additional functionality but there is nothing that would stop me from proposing to merge it.

All eyes on the code are welcome. I'd be happy to take ideas for improvements, either as blockers for a merge or as nice to haves for later.

This has taken me much longer than expected but I hope things will be easier now that everything seems to just work!

sarumjanuch commented 5 years ago

Very glad to see you back in town. It was very calm for some time in ng repo, I already thought you abandon it. Last time I checked ng, there was constant segfaults in certain circumstances under any version of node (8+). Will check tomorrow, on how it's behaving now, and reply here. I will be more than happy to move to ng. One more time, thanks for all your work!

sarumjanuch commented 5 years ago

I actually got to the tests today, i can't reproduce segfaults i have been able to reproduce constantly earlier. So far so good. Will continue testing.

rolftimmermans commented 4 years ago

ZeroMQ-NG has been merged! The tracking issue is #347.

rgbkrk commented 4 years ago

Great culling of the issues @rolftimmermans! I'm finally trying to catch myself on everything for the new zeromq.js. 😄