Closed rolftimmermans closed 4 years ago
This is super cool. I'm generally pretty happy with taking this direction. I'll give this new API a spin with our current code to see how it behaves and see how easy it is to adapt.
The way I'd end up wrapping this (like we do now in nteract) is to turn these async iterators into observables which I don't think is too bad. I can see some areas where I'd prefer the error flow I have with observables (noting the timeout causing promise rejection), though I think I see how I'd handle these generally. All that being said, async iterators are stage 3 (further along), Observables stage 1 (and ready for next stage). I guess that's a yay for async iterators from me I suppose.
I'm interested to hear what @kkoopa thinks as someone so heavily involved in this library and in node.js addons.
Hi @rolftimmermans, good stuff! Please steal what you need from https://github.com/JustinTulloss/zeromq.node/pull/516 which is also a full rewrite. I think there are a few decent ideas in there.
Looks good. This library has been in need of an overhaul.
An update: I have added API documentation, support for socket events and prebuilt binaries for Node.js 8.6 on Linux/macOS. It is available as zeromq-ng
. Looking forward to your feedback!
@rolftimmermans Wow thanks for digging into this. I'm excited to take a look soon.
I think for us at least we'll need to figure out how well N-API works with Electron, so we're going to have to link up our current setup with the new bindings. Thanks for publishing on zeromq-ng
.
Electron compatibility depends on one of the two following happening:
I have some initial benchmarks comparing zeromq-ng vs zeromq.js. There are three types of benchmarks, all are executed with both zeromq-ng (zmq=ng) and zeromq.js (zmq=cur) various message sizes (msgsize=...), and tcp/inproc transport (proto=...):
All benchmarks can be found here.
The benchmarks were executed on my Macbook. It would be nice to get some Linux benchmarks too!
queue proto=tcp msgsize=1 zmq=cur x 51.79 ops/sec ±1.43% (56 runs sampled) queue proto=tcp msgsize=1 zmq=ng x 55.11 ops/sec ±1.88% (64 runs sampled) queue proto=tcp msgsize=16 zmq=cur x 53.26 ops/sec ±1.24% (63 runs sampled) queue proto=tcp msgsize=16 zmq=ng x 54.54 ops/sec ±1.00% (64 runs sampled) queue proto=tcp msgsize=256 zmq=cur x 47.60 ops/sec ±1.29% (61 runs sampled) queue proto=tcp msgsize=256 zmq=ng x 39.29 ops/sec ±1.17% (57 runs sampled) queue proto=tcp msgsize=4096 zmq=cur x 46.38 ops/sec ±1.93% (60 runs sampled) queue proto=tcp msgsize=4096 zmq=ng x 38.45 ops/sec ±0.88% (56 runs sampled) queue proto=tcp msgsize=65536 zmq=cur x 22.20 ops/sec ±2.84% (54 runs sampled) queue proto=tcp msgsize=65536 zmq=ng x 19.13 ops/sec ±1.94% (50 runs sampled) queue proto=tcp msgsize=1048576 zmq=cur x 7.74 ops/sec ±0.91% (46 runs sampled) queue proto=tcp msgsize=1048576 zmq=ng x 7.23 ops/sec ±1.30% (45 runs sampled) queue proto=inproc msgsize=1 zmq=cur x 30.62 ops/sec ±6.90% (48 runs sampled) queue proto=inproc msgsize=1 zmq=ng x 20.05 ops/sec ±2.86% (51 runs sampled) queue proto=inproc msgsize=16 zmq=cur x 17.53 ops/sec ±2.87% (47 runs sampled) queue proto=inproc msgsize=16 zmq=ng x 15.28 ops/sec ±1.45% (60 runs sampled) queue proto=inproc msgsize=256 zmq=cur x 12.99 ops/sec ±3.18% (56 runs sampled) queue proto=inproc msgsize=256 zmq=ng x 10.66 ops/sec ±2.71% (52 runs sampled) queue proto=inproc msgsize=4096 zmq=cur x 9.57 ops/sec ±1.75% (50 runs sampled) queue proto=inproc msgsize=4096 zmq=ng x 7.73 ops/sec ±2.46% (46 runs sampled) queue proto=inproc msgsize=65536 zmq=cur x 3.63 ops/sec ±5.38% (34 runs sampled) queue proto=inproc msgsize=65536 zmq=ng x 2.97 ops/sec ±4.68% (32 runs sampled) queue proto=inproc msgsize=1048576 zmq=cur x 3.00 ops/sec ±5.86% (31 runs sampled) queue proto=inproc msgsize=1048576 zmq=ng x 3.09 ops/sec ±2.05% (33 runs sampled) deliver proto=tcp msgsize=1 zmq=cur x 3.14 ops/sec ±3.76% (33 runs sampled) deliver proto=tcp msgsize=1 zmq=ng x 2.80 ops/sec ±4.53% (32 runs sampled) deliver proto=tcp msgsize=16 zmq=cur x 2.92 ops/sec ±5.53% (31 runs sampled) deliver proto=tcp msgsize=16 zmq=ng x 2.49 ops/sec ±6.40% (31 runs sampled) deliver proto=tcp msgsize=256 zmq=cur x 3.24 ops/sec ±1.94% (33 runs sampled) deliver proto=tcp msgsize=256 zmq=ng x 2.93 ops/sec ±1.81% (32 runs sampled) deliver proto=tcp msgsize=4096 zmq=cur x 3.16 ops/sec ±2.68% (32 runs sampled) deliver proto=tcp msgsize=4096 zmq=ng x 2.73 ops/sec ±3.67% (31 runs sampled) deliver proto=tcp msgsize=65536 zmq=cur x 1.80 ops/sec ±2.47% (28 runs sampled) deliver proto=tcp msgsize=65536 zmq=ng x 1.56 ops/sec ±3.77% (27 runs sampled) deliver proto=tcp msgsize=1048576 zmq=cur x 0.43 ops/sec ±5.24% (22 runs sampled) deliver proto=tcp msgsize=1048576 zmq=ng x 0.48 ops/sec ±5.65% (22 runs sampled) deliver proto=inproc msgsize=1 zmq=cur x 3.15 ops/sec ±1.73% (33 runs sampled) deliver proto=inproc msgsize=1 zmq=ng x 3.57 ops/sec ±4.28% (34 runs sampled) deliver proto=inproc msgsize=16 zmq=cur x 3.15 ops/sec ±1.53% (33 runs sampled) deliver proto=inproc msgsize=16 zmq=ng x 3.80 ops/sec ±1.32% (35 runs sampled) deliver proto=inproc msgsize=256 zmq=cur x 2.81 ops/sec ±6.51% (30 runs sampled) deliver proto=inproc msgsize=256 zmq=ng x 3.19 ops/sec ±6.19% (33 runs sampled) deliver proto=inproc msgsize=4096 zmq=cur x 3.12 ops/sec ±2.00% (33 runs sampled) deliver proto=inproc msgsize=4096 zmq=ng x 3.67 ops/sec ±2.82% (34 runs sampled) deliver proto=inproc msgsize=65536 zmq=cur x 2.23 ops/sec ±3.17% (30 runs sampled) deliver proto=inproc msgsize=65536 zmq=ng x 2.48 ops/sec ±1.72% (30 runs sampled) deliver proto=inproc msgsize=1048576 zmq=cur x 1.63 ops/sec ±4.84% (27 runs sampled) deliver proto=inproc msgsize=1048576 zmq=ng x 2.13 ops/sec ±8.45% (29 runs sampled) deliver multipart proto=tcp msgsize=1 zmq=cur x 2.94 ops/sec ±2.59% (32 runs sampled) deliver multipart proto=tcp msgsize=1 zmq=ng x 2.86 ops/sec ±4.72% (33 runs sampled) deliver multipart proto=tcp msgsize=16 zmq=cur x 3.00 ops/sec ±2.43% (33 runs sampled) deliver multipart proto=tcp msgsize=16 zmq=ng x 2.65 ops/sec ±4.05% (32 runs sampled) deliver multipart proto=tcp msgsize=256 zmq=cur x 2.99 ops/sec ±3.40% (33 runs sampled) deliver multipart proto=tcp msgsize=256 zmq=ng x 3.04 ops/sec ±3.42% (33 runs sampled) deliver multipart proto=tcp msgsize=4096 zmq=cur x 2.67 ops/sec ±5.66% (31 runs sampled) deliver multipart proto=tcp msgsize=4096 zmq=ng x 1.94 ops/sec ±6.08% (27 runs sampled) deliver multipart proto=tcp msgsize=65536 zmq=cur x 1.18 ops/sec ±6.29% (25 runs sampled) deliver multipart proto=tcp msgsize=65536 zmq=ng x 1.32 ops/sec ±6.06% (26 runs sampled) deliver multipart proto=tcp msgsize=1048576 zmq=cur x 0.16 ops/sec ±4.03% (20 runs sampled) deliver multipart proto=tcp msgsize=1048576 zmq=ng x 0.19 ops/sec ±1.03% (20 runs sampled) deliver multipart proto=inproc msgsize=1 zmq=cur x 2.90 ops/sec ±2.65% (32 runs sampled) deliver multipart proto=inproc msgsize=1 zmq=ng x 3.64 ops/sec ±1.20% (34 runs sampled) deliver multipart proto=inproc msgsize=16 zmq=cur x 2.92 ops/sec ±2.05% (32 runs sampled) deliver multipart proto=inproc msgsize=16 zmq=ng x 3.68 ops/sec ±2.37% (35 runs sampled) deliver multipart proto=inproc msgsize=256 zmq=cur x 3.01 ops/sec ±1.45% (32 runs sampled) deliver multipart proto=inproc msgsize=256 zmq=ng x 3.42 ops/sec ±1.10% (34 runs sampled) deliver multipart proto=inproc msgsize=4096 zmq=cur x 2.82 ops/sec ±3.97% (31 runs sampled) deliver multipart proto=inproc msgsize=4096 zmq=ng x 3.42 ops/sec ±1.16% (34 runs sampled) deliver multipart proto=inproc msgsize=65536 zmq=cur x 1.57 ops/sec ±4.97% (27 runs sampled) deliver multipart proto=inproc msgsize=65536 zmq=ng x 1.57 ops/sec ±4.48% (27 runs sampled) deliver multipart proto=inproc msgsize=1048576 zmq=cur x 1.05 ops/sec ±1.93% (24 runs sampled) deliver multipart proto=inproc msgsize=1048576 zmq=ng x 1.23 ops/sec ±1.52% (25 runs sampled)
Some casual observations:
Overall I find the differences pretty small. It would be nice to have some "real world" benchmarks. Can anyone suggest a good use case to test?
How much of the old API (the "typical" API that is) can we keep around, marking for deprecation, even if it wraps the new implementation?
The only issue I have is, why does this have to be in yet another npm namespace? We had zmq, I don't know why but you guys moved it to zeromq, and now you're moving it again to zeromq-ng? Can't we just keep one namespace? It's so much easier to upgrade when the namespace doesn't change.
We're not moving it to zeromq-ng
. @rolftimmermans wanted to publish a package that made it easy to try a completely different interface, without breaking current usage.
As mentioned I much prefer to keep everything in zeromq.js
where possible, and zeromq-ng
is just a way to publish my work so far to get feedback.
@rgbkrk I like your idea and will try to create a wrapper that behaves like the current API. It would sacrifice all the semantic improvements, but should make it easier to use both APIs side by side.
I have added a compatibility layer that covers 95% of the existing API. Missing (and currently impossible to add with the new API) are: (un)bindSync, and read().
That is great. Should hopefully make the transition much smoother.
Hi there looks like some great improvements. Dumb question - is work still ongoing on this? A quick glance at the repository(ies) seems to suggest it has stalled...
As far as I'm concerned – absolutely! More real world usage means more feedback and more fixes.
Here's the direction I'd like to go:
zeromq-ng/compat
as the primary interface for a single major version bumpzeromq-ng
How's that sound @rolftimmermans?
That sounds absolutely excellent. I will continue work on zeromq-ng
; there are two important things to resolve before we can expose the compat version as zeromq
. I'm confident we should be able to resolve them over the next weeks/months or so;
For the last one I'm dependent on changes to node-pre-gyp (or prebuild – same issues apply there).
I also wonder what we should do about Electron builds; I am not quite aware what the N-API story is there.
I'll gather all details over the next week and open a new issue to discuss the practicalities of how this can be accomplished.
Any feedback regarding the NG API is more than welcome in the mean time.
This is really great stuff. @rolftimmermans, have you been able to do this without copying the buffers you send? In the old implementation, every buffer gets duplicated, which really ruins part of what makes 0MQ fast in the first place.
@rolftimmermans It's been quite some time since you started this effort and it looks like zeromq-ng
is coming along really great. 🎉
Whats your take on the current status of zeromq-ng
?
I would be 👍for moving your efforts into this repo and start making a few beta releases to get more feedback from the community. @ronkorving @reqshark @rgbkrk @kkoopa What do you think?
I see the tests are written in typescript now @rolftimmermans, would you consider doing the base library in typescript too? 🤤
I think it's sort of ready for prime time.
We have started using it in production, no issues that I am aware of. I think other projects have started depending on zeromq-ng as well; I have had a few questions here and there, but I'm not sure how widely it is used in production.
I am personally really satisfied with the API, but there is one issue opened regarding it, see https://github.com/rolftimmermans/zeromq-ng/issues/32. I think the API is sufficient to express all use cases without encouraging bad implementations, but more opinions might be welcome.
I am not sure what the story for Electron is. With N-API it should work, I guess? I don't develop for Electron and I have no idea how to check. Might be worth investigating by someone?
The C++ code could use some review. I think I have checked and double-checked everything, but C++ bugs can be subtle.
The test suite very very occasionally segfaults – which is rather annoying and potentially worrying. I have been unable to reliably reproduce though. Not sure if this is a libzmq or a zeromq-ng issue, either.
Node 10 is soon becoming LTS and Node 6 is EOL in April Next year. Because of N-API zeromq-ng requires at least 8+. So dropping Node 6 support would not be too horrible I guess? There's always the 4.x branch if someone needs it.
There is probably some approach to be discussed how to move the project. Should we create a new branch for it, move repositories, ...? Not sure what the best approach is here.
I see the tests are written in typescript now @rolftimmermans, would you consider doing the base library in typescript too? 🤤
Most of it is C++ so that won't give us any TypeScript bindings. I have hand written TypeScript definitions published as part of the library and the test suite itself uses them (and therefore they should be tested, too).
Potentially the getters and setters for sockets and context (currently implemented with meta-programming) can be written in TypeScript by using code generation. That is a fine approach, but it requires some more effort.
I'm not sure how easy it is to combine an automatically generated .d.ts
file (for getters/setters) with a hand-written .d.ts
file (for native C++ code) into a single class definition. Has anyone ever done this before?
I think it's sort of ready for prime time.
That's awesome! 🎉 🚀
- I am not sure what the story for Electron is. With N-API it should work, I guess? I don't develop for Electron and I have no idea how to check. Might be worth investigating by someone?
I tried it out at https://github.com/nteract/nteract/pull/3388 and it worked great with Electron 3 so far. There are a few annoying issues related to the publishing process, since node-pre-gyp
, N-API and electron-builder
don't play very well together. This pain points would likely be solved by https://github.com/rolftimmermans/zeromq-ng/pull/46. But no real blockers, on the zeromq-ng
side.
- Node 10 is soon becoming LTS and Node 6 is EOL in April Next year. Because of N-API zeromq-ng requires at least 8+. So dropping Node 6 support would not be too horrible I guess? There's always the 4.x branch if someone needs it.
I think technically one might even be able to use N-API together with Node 6, but I think it's totally reasonable drop support for Node 6. Personally I'd even be OK with only supporting Node 10+ 😇
- There is probably some approach to be discussed how to move the project. Should we create a new branch for it, move repositories, ...? Not sure what the best approach is here.
If nobody raises objections to zeromq-ng
being the future of zeromq.js
, I'd propose the following next steps:
zeromq.js@5.x
branch for bugfixeszeromq.js@master
that brings in zeromq-ng
zeromq@6.0.0-beta0
to npm
zeromq-ng@5.x
zeromq@6.0.0
to npm
once we feel ready 📦 I'm cool with that plan.
Great work, looks very clean. I would be very worried about the segfault, though. You'll probably want to analyze a core dump. Maybe this group can help with the analysis?
Extra note: N-API is now stable as of Node 10 which is the most recent LTS. I think it's time for us to bring this in.
@rolftimmermans can you take on making the PR to master with zeromq-ng
?
NOTE: I added an extra item to the plan @lgeiger outlined since we'll also want to deprecate the typings on DefinitelyTyped.
@crazybits please post to stack overflow or raise a new issue, as your support needs are unrelated to this issue.
is this still planed soon ?
Haven't heard back from @rolftimmermans in a bit and the rest of us have been busy with other projects (or in my case a baby), so I don't think you'll be seeing it soon.
Main blocker are a bunch of segfaults with Node 10. I haven't had the chance to work on it as much as I'd like – feel free to audit the code base and try it out for yourself! Contributions very much welcome!
Once the Node 10 issues are resolved I'd say it's ready for production (we're using it in production with Node 8 successfully so far).
:+1: For implementing bindings using n-api. It would help a lot with maintenance and also with support for electron without needing to rebuild the bindings.
It's almost 2020 and I have come to the conclusion that all problems seem to vanish when compiling against Node 10. I have ripped out all compatibility layers dealing with earlier, experimental N-API versions in Node 8.7+. I have tested and verified that these binaries now work correctly with Node 10, 12 and 13 RC 1 without issues.
Since Node 8 is nearing the end of its lifetime I am ready to declare the implementation of ZeroMQ-NG stable. We have deployed the latest beta to production without issues so far (on Alpine Linux).
Additionally I have been able to create all prebuilds with just Travis, ditching AppVeyor, also build libzmq from source on Windows. Prebuilds are now included in the published package with prebuildify, so there are no download steps if a build exists for the user's system.
I have some ideas for improvements and additional functionality but there is nothing that would stop me from proposing to merge it.
All eyes on the code are welcome. I'd be happy to take ideas for improvements, either as blockers for a merge or as nice to haves for later.
This has taken me much longer than expected but I hope things will be easier now that everything seems to just work!
Very glad to see you back in town. It was very calm for some time in ng repo, I already thought you abandon it. Last time I checked ng, there was constant segfaults in certain circumstances under any version of node (8+). Will check tomorrow, on how it's behaving now, and reply here. I will be more than happy to move to ng. One more time, thanks for all your work!
I actually got to the tests today, i can't reproduce segfaults i have been able to reproduce constantly earlier. So far so good. Will continue testing.
ZeroMQ-NG has been merged! The tracking issue is #347.
Great culling of the issues @rolftimmermans! I'm finally trying to catch myself on everything for the new zeromq.js. 😄
TL;DR: I'm this issue to start a discussion about the future of zeromq.js and my work on a proof of concept for a next-generation version. Please let me know your opinion!
Background
We have recently taken a system into production that uses zeromq.js. This works well, but I encountered a number of issues with this library that triggered me to think about solutions. Unfortunately some problems did not appear to be fixable without a massive overhaul of the internals. I will briefly describe the main problems.
Problem 1: Sent messages are queued, but not with ZMQ
Zeromq.js uses an internal queuing mechanism for sent messages. Essentially it pushes all messages onto a JavaScript array and sends them out as soon as the socket is ready for writing. This works nicely except when it doesn't:
Related issues: #152 #185 #124
From https://github.com/zeromq/zeromq.js/issues/152#issuecomment-326894190 I understand that this queueing was built to improve performance. Which means that matching the performance of the current library should be a goal of any new version.
Problem 2: Messages that are received are read automatically
Readable sockets follow the event emitter pattern. This is common in Node.js APIs, so it makes sense for most people. There is important one drawback, though: messages are read automatically.
It appears that a call to
pause()/resume()
offers a suitable workaround, however:pause()
also disables writes, which may not be intentional.pause()
will always dispatch all messages that can be read from the ZMQ queue without blocking before it actually stops reading from the socket.pause()/resume()
to approximate features that are offered by libzmq by default, which is not a very friendly developer experience.Related issues: #139
Other potential improvements
There were some other things we encountered while working on the library that we believed could be improved:
My work on a proof of concept
Given that nature of problems 1 & 2 I set out to rewrite the internals and design a better API that is suitable for honouring high water marks and timeouts for both send & receive operations.
Because I was now effectively rewriting the entire library I also decided to address the additional improvements mentioned above. Especially the move to N-API is non-trivial and influences most of the native code.
To summarize, the goals of my experiment were:
What is included and working
What is still missing
The following features have not yet been added. My plan is to implement them over the next ~weeks~ years:
So, how do I use it?
~To build the proof of concept version you need a local installation of libzmq with development headers. For example, on Debian/Ubunto you can
apt-get isntall libzmq3-dev
, or on macOS you canbrew install zeromq
.~~Then clone the repository and run the tests with
yarn test
ornpm test
.~Install with
npm install zeromq-ng
.Next, have a look at the the examples in the README. To give you a taste of the API, this is how you can use it:
I've you made it this far reading this post, thank you for your attention!
What next?
I'd like to invite you to have a look at https://github.com/rolftimmermans/zeromq-ng and give your feedback.
Specifically, I'd like to discuss two scenarios:
I'd be completely happy with either.
In addition to the above I am looking from feedback from users that have experience with one or more of the problems mentioned above (or even other problems!). It would be nice if you can give your opinion on the API or even test the proof of concept.
I am also in the process of writing and collecting benchmarks and will post the results soon. Based on what I have tested so far I'm expecting similar performance in general.