nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.67k stars 29.63k forks source link

discuss: object serializer/deserializer + transferables #6300

Closed eljefedelrodeodeljefe closed 7 years ago

eljefedelrodeodeljefe commented 8 years ago

I wanted to propose this issue as a top-level home for discussions from multiple issues, amongst https://github.com/nodejs/node/pull/2133, https://github.com/nodejs/node/issues/3145.

Both issues are too big to have immediate action (judging from what I have read), but they have a serializer/deserializer feature in common, node, c++ and v8 are not providing out of the box. @bnoordhuis mentioned that it seems to be unrealistic to drag in Blink code because of the many dependencies within that repo. I had a look also into their WebWorker implementation, which is fairly huge.

(Ignoring the WebWorker and IPC discussion) the question would be whether we first wanted to provide an efficient serializer/ deserializer + transferables of c++ objects, which might aid a lot of problems at once and also be a nice standalone feature.

Reading up on this from various c++ resources this is of course a non-trivial task. Whereas I also don't believe there will ever be a single method for arbitrary object serialization (please prove me wrong), due to the way v8 works. We would need to judge this before hand because that means a lot of serialization strategies would be insufficient for us.

As a quick guess, that would probably lead us to implement ::serialize() methods per Class, no matter if binary or text, which would ensure serilization happens correctly at dev / build time.

Fishrock123 commented 8 years ago

Should this be pushed for in v8?

Otherwise, is this an eventual candidate for the VM/API shim?

Honestly, there isn't much point for this until workers land afaik.

bnoordhuis commented 8 years ago

Cluster would benefit from faster serialization/deserialization. People complain it's slow to send large objects, thinking it's because it goes over a UNIX socket, but in my experience, it's the JSON.stringify/JSON.parse calls that are most expensive.

ChALkeR commented 8 years ago

Another potentialy related issue: #5453.

ChALkeR commented 8 years ago

child_process is documented to use JSON.stringify since #5723, so any change there would be a semver-major.

eljefedelrodeodeljefe commented 8 years ago

@ChALkeR wouldn't (w/o searching the bottleneck) need to be a replacement but rather be an option. E.g. we could provide the option for shared memory IPC where this would be useful.

@Fishrock123 looking at specs I doubt that we want to be a vm-concurrency only program, no? Since there are multiple workers also, I see anyhow a multi-concurrency model coming for the lang.

So from the discussion so far I conclude, that it would be interesting to at least look into it. If time allows I do so in the next couple of weeks. @bnoordhuis have you done any work on this so far? I know you have deferred this more then once in the past.

bnoordhuis commented 8 years ago

child_process is documented to use JSON.stringify since #5723, so any change there would be a semver-major.

We could make it opt-in by adding a new configuration option or a command line flag. That would make it semver-minor.

@bnoordhuis have you done any work on this so far?

Nothing substantial.

targos commented 8 years ago

While I am for some kind of serializer that would allow to transfer more things than JSON.stringify (thinking about undefined or circular structures), I don't think we would achieve the main goal (performance) with Blink's code. It is a lot faster to use JSON.parse/JSON.stringify to transfer objects to Web Workers or iframes in Chrome.

camillobruni commented 8 years ago

V8-dev here: There work in progress to migrate the blink serializer to V8 see https://bugs.chromium.org/p/chromium/issues/detail?id=148757. Note that as a first step we will implement a legacy serialization format which is not supposed to be used outside of blink as it has certain shortcomings. In a second step a newer format is going to be implemented which should be more beneficial. If Node is interested in a better serialization format other than JSON, I suggest to have a look at the new format once it's ready.

eljefedelrodeodeljefe commented 8 years ago

Thanks for sharing. Might have missed it, but is there any other place a design discussion happened (on the internet)?

jasnell commented 8 years ago

@camillobruni ... out of curiosity, what is the newer format that is being looked at? Wouldn't be https://tools.ietf.org/html/rfc7049 by chance, would it? If so, +1.

camillobruni commented 8 years ago

@jeremyroman is the implementor, he will know more details. From what I recall, its definitely not based on rfc7049 but rather an extension of the existing format.

jeremyroman commented 8 years ago

Hi, Blink dev here.

My objective is to support the HTML structured clone algorithm for Blink, and for legacy reasons, we need to be able to read the format we already store on disk for IndexedDB. Even without the legacy constraint, the RFC7049 format doesn't seem to support some of the things HTML structured clone requires (e.g. reference equality is preserved, i.e., if a is an object, [a,a] deserializes such that a === a). Once that's done, I'll probably tweak the format to remove some legacy quirks and store some extra information to help speed up deserialization a little.

I have a slightly out of date doc about that work. It's written from the perspective of Blink's needs.

While I do expect it to substantially outperform Blink's current structured clone, there is some cost to tracking references (to handle circular structures etc.), so it may not outperform JSON.stringify.

rumkin commented 8 years ago

@jeremyroman Will it support regular expression, functions and streams?

jeremyroman commented 8 years ago

It will support RegExp, yes. Functions and streams are not structured clonable, so at present, no. I believe it's "no" to the others. (It's not even clear how a function would be cloned in general, given the existence of both "native" functions and functions which close over variables.)

addaleax commented 7 years ago

This has happened in https://github.com/nodejs/node/pull/11048, closing :tada: