whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.03k stars 2.62k forks source link

Synchronous clone = global.structuredClone(value, transfer = []) API #793

Closed annevk closed 3 years ago

annevk commented 8 years ago

As proposed in https://lists.w3.org/Archives/Public/public-webapps/2015AprJun/thread.html#msg251 at some point there seems to be some interest in doing this and it would expose a primitive without having to go through postMessage()/onmessage.

Is this still a good idea in 2016?

banksJeremy commented 7 years ago

I think there's a lot of demand for a structuredClone() function like this. Though it may not be the best practice, realistically it is very common to have a lot of application state in ad-hoc graphs of standard data structures. I've had to implement similar cloning functions to help support that on a few different projects, and I've seen bugs from people using JSON.parse(JSON.stringify(...)) instead without considering cyclic data structures, leading to unexpected crashes later.

A built-in feature or interface for deep cloning standard data structures is common in other dynamic languages, and is something I've seen many novices look for when starting to use JavaScript. It would be nice to have a standard that includes making this fully extensible, but as you discussed on the mailing list that is quite complicated and doesn't seem to be happening soon. Extensibility is somewhat orthogonal to exposing this existing functionality to users.

As an outsider this appears to be relatively low-hanging fruit that many users would benefit from.

annevk commented 6 years ago

As pointed out in https://twitter.com/DasSurma/status/955484341358022657 by @surma you can already do this synchronously today by (ab)using the history API. Seems like another reason to expose this.

surma commented 6 years ago

I think there is a use-case for wanting to deep-copy objects, and the structured clone algorithm comes very close to that — it would solve the vast majority of use-cases.

The hack with the History API can be slow as there’s some cross-process communication going on.

I also wrote an asynchronous version using MessageChannel that turned out to be faster than the History API or JSON.parse(), even for big objects:

function structuredClone(obj) {
  return new Promise(resolve => {
    const {port1, port2} = new MessageChannel();
    port2.onmessage = ev => resolve(ev.data);
    port1.postMessage(obj);
  });
}

But sometime a synchronous version is very desirable.

As an outsider this appears to be relatively low-hanging fruit that many users would benefit from.

I agree with this.

annevk commented 6 years ago

The main thing blocking this is getting interest from implementers (judging by that 2015 thread there is interest from Mozilla) and then finding someone who wants to write the specification and someone who wants to write the tests (can be the same someone).

@ajklein @othermaciej @dstorey thoughts?

surma commented 6 years ago

Maybe I'm naïve, but shouldn't specification be rather trivial, considering that the structured clone algorithm is already specified? If that is the case, I'm happy to take this as my opportunity to write my first spec bits and tests :D (Provided we get the interest bit sorted, of course)

jeremyroman commented 6 years ago

I wonder if this structured clone is actually what the developer wants. For instance, structured clone does some replication of the object graph, but makes no attempt to replicate the original prototype chain, so if the author has Point objects and expects to get Point objects out, they will be disappointed. (There's not an obvious reasonable way to do this cross-realm, but perhaps that is what authors want within the realm.)

I guess it's at least as close a match as JSON.parse(JSON.stringify(o)), though.

RamIdeas commented 6 years ago

@jeremyroman Not entirely sure it would be a full/strict clone if you still referenced the old Point constructor in the prototype chain so this would be kind of expected, right?

jeremyroman commented 6 years ago

It depends what the application is trying to do. Naively, I wouldn't blame an author for thinking this is reasonable:

let p = new Point(3, 4);
let p2 = clone(p);
console.assert(p2 instanceof Point);

Structured cloning necessarily clones only the things that you can kinda reasonably do across realms. I'm not sure it's a general-purpose deep clone (though I admit I'm not familiar with the ones apparently present in other dynamic languages), though it's possible that it is suitable for some author use cases.

surma commented 6 years ago

though it's possible that it is suitable for some author use cases

I’d argue it’s enough for the majority of cases, but we’d have to look into that. Types by the author have never been cloned (unless the author also wrote a custom cloning function), so I think we should expose structured clone first, before thinking about how to handle the prototype chain.

othermaciej commented 6 years ago

Tagging @cdumez and @rniwa to give WebKit thoughts on this.

samal-rasmussen commented 6 years ago

Maybe we want to have two different clone variants, one that just does structural cloning and one that also clones the prototype hierarchy properly as well. Call em structualClone() and cloneWithPrototypeHierarchy() or whatever. In any case the former is basically already done, as surma mentioned, so why not? Let's go already.

jeremyroman commented 6 years ago

If what's wanted is a generic way to deeply clone ECMAScript objects, maybe that's something that belongs in the ECMAScript spec (or maybe just a third-party library) rather than the HTML spec.

On the other hand, if it's useful for authors to have semantics that match postMessage, IndexedDB, etc. (for which it's not really clear what dealing with the prototype chain would even mean), then perhaps HTML should expose the existing primitive as suggested.

surma commented 6 years ago

I think for now this issue is about exposing the already existing structured cloning algorithm. I totally see that there’s a need for a proper copy (including prototype), but that would have to be a new algorithm and, as you said, is probably a better fit for ECMA262.

jeremyroman commented 6 years ago

Another thought here: assuming authors want these semantics, would it be more useful to expose a combined "structured clone" primitive (that serializes and deserializes immediately), or separate structured-serialize and structured-deserialize functions as some opaque SerializedValue object (which would allow deserialize to happen at a separate time, and if there is no transfer, even multiple times)?

Dan503 commented 6 years ago

I'm very much in favour of having an easy way to create a deep clone of an object :)

I'd prefer a syntax like this though: Object.clone({key: "value"})

annevk commented 6 years ago

FWIW, I think there'll be the most chance of success if we start very simple, even simpler than OP suggests, with just global.structuredClone(value) which does StructuredDeserialize and StructuredSerialize internally. That'll be fairly straightforward to implement as well.

Supporting transferables, exposing StructuredDeserialize/StructuredSerialize separately as well as an intermediate value you can copy/message, making StructuredDeserialize/StructuredSerialize extensible for arbitrary JavaScript objects, etc. are definitely interesting, but seem less necessary for a v0 and flushing them out and gathering support would take a lot of time. None of them are blocked by this simple API v0 API either.

domenic commented 6 years ago

Although I agree with the tendency toward simplicity, I would argue that adding a transfer list is potentially valuable and shouldn't add much complexity given how it builds on spec primitives that are already there.

surma commented 6 years ago

I started a PR for the spec change with #3414. I haven’t exposed the transfer list yet, but I can add that once we get the technicalities right :)

annevk commented 6 years ago

Note that the primitives for transferables might be wrong:

onmessage = e => w(e.ports[0])
postMessage(null, "*", [new MessageChannel().port1]);

The above ends up logging a MessagePort object, which I don't think works at the moment as the specification describes things. That's also why I cautioned against exposing transferables, as you need a more complex API; it's not just adding a second argument, it's also figuring out a new return value (or accepting you're not 1:1 with postMessage(), which gives room for arguments).

surma commented 6 years ago

I think it’s okay to diverge from the behavior of postMessage() here. Strictly speaking, it wouldn’t even be diverging behavior because the structured clone in that scenario would be in e.data, which would still be null.

annevk commented 6 years ago

@surma it's diverging if you want to include transferables.

domenic commented 6 years ago

which I don't think works at the moment as the specification describes things

Why do you think that? We fixed all that a while back, from what I understand.

annevk commented 6 years ago

@domenic can you explain how the MessagePort object gets transfered (including allocation of a new object)?

domenic commented 6 years ago

Sure, https://html.spec.whatwg.org/#structureddeserialize step 5 (specifically 5.4.3) plus https://html.spec.whatwg.org/#message-ports:transfer-receiving-steps

annevk commented 6 years ago

@domenic how would serialized contain a [[TransferConsumed]] field?

domenic commented 6 years ago

It's set in https://html.spec.whatwg.org/#structuredserializewithtransfer step 5.4.3.

annevk commented 6 years ago

But that does not end up affecting serialized in my example, as far as I can tell.

domenic commented 6 years ago

I see, yeah.

annevk commented 6 years ago

I posted a fix for that issue. I'm not sure to what extent it should affect the API. I find it a little weird if we don't expose the full primitive for transferables since unless we add a second method it would be hard to do so going forward. And if we have two methods, I'd rather have the simpler variant not support transferables. Just take one value and return one.

samal-rasmussen commented 6 years ago

"And if we have two methods, I'd rather have the simpler variant not support transferables. Just take one value and return one."

A good example of "worse is better". This would make it even easier, and therefore maybe also even quicker, to get implemented. And it would cover the 99% use case right? Has anyone even thought of the use cases for transferrables when doing a simple direct structuralClone() invocation? Why do we care?

domenic commented 6 years ago

I don't think omitting transferables would have any impact on implementability. Structured clone with transfer already exists in all browsers.

Transferring is quite useful for many use cases, e.g. when you want to take ownership of memory.

wanderview commented 6 years ago

It seems synchronous structured clone is already exposed via history state, but I wonder if we will regret not making it async. It would be really nice to store async consumable things like Request objects with stream bodies in IDB. These can only be consumed or copied async. Hopefully synchronous structured clone will not prevent this use case.

surma commented 6 years ago

I wonder if we will regret not making it async.

We have a fairly performant async structured clone exposed via MessageChannel. I think for any async cases that API is actually good enough for now. It’s still somewhat of an API abuse, but much less than History API ^^ Having an explicit synchronous clone seems useful to me, especially to keep certain APIs in tact but make them more performant. Or am I misunderstanding your point?

jeremyroman commented 6 years ago

It's not clear to me what it would mean to make it asynchronous. Nothing prevents you from synchronously cloning an object that you consume asynchronously. The basic problem is that at least the serialization half must be done ~synchronously if you want to preserve the existing semantics (otherwise the visible property access etc. can occur at some later time). Anything asynchronous will probably have less predictable behavior and greater overhead on reasonably small object graphs (bearing in mind it is impossible to tell in advance, in general, whether the object graph will be small, because we can trigger getters, proxies, etc, while traversing).

The serialization and deserialization steps could be fairly easily decoupled, which more or less splits the work in two.

loilo commented 6 years ago

To even open up another flank on this (sorry in advance if this is inappropriate/out of scope):

It may be reasonable to think about a hook for modifying an object's structural clone from its inside — similar to what toJSON() enables to do:

JSON.parse(JSON.stringify({
  toJSON () {
    return 'foo'
  }
})) === 'foo'

I know this goes beyond just exposing existing functionality, but it may at least be a thought to consider (or reject) since it could not be added as a follow-up without a breaking change.

jeremyBanks commented 5 years ago

Potential relevant: elsewhere in the JavaScript ecosystem Node exposes their structured cloning/serialization implementation directly through their v8 built-in module, although it is still marked as "experimental".

const v8 = require('v8');
// ...
let clone = v8.deserialize(v8.serialize(original));
annevk commented 5 years ago

@jeremyBanks thanks for posting that!

The format is backward-compatible (i.e. safe to store to disk).

Is quite interesting. If browsers could agree on this format we'd have a new kind of JSON...

jeremyroman commented 5 years ago

Node is exposing V8's structured serialization implementation, which uses an evolution of Blink's (and before that, WebKit's) wire format, which is what Chromium stores in IndexedDB on disk, etc. (I imagine Mozilla has some similar format?)

Ours is missing some traits that might be desirable if it were to be used in places where JSON is. For instance, it is not forward-compatible: we assume you only ever read data in equal-or-greater versions of Chromium, which is probably not acceptable for use over a network or otherwise passed between different implementations.

annevk commented 5 years ago

Ah yeah, that would indeed not work. If we want to go there it's probably best discussed in its own issue, sorry for distracting this one.

jakearchibald commented 5 years ago

@wanderview

but I wonder if we will regret not making it async

Which parts can be async? Since it's crawling a JS object, and creating new JS objects, I thought the bulk of the work would be main thread anyway.

jeremyroman commented 5 years ago

Conceivably deserialization of a very large object could be done in small pieces, yielding to the scheduler. It's unclear whether we would ever do this, but I think it would in principle be possible.

jakearchibald commented 5 years ago

Yeah, fair enough. Reading the JS object would need to be sync, but creating the new ones could be spread over tasks.

Maybe we should have an async API too (but we can already do it with message ports), but we should definitely have a sync version.

wanderview commented 5 years ago

I now think my original concern of storing consumables in IDB, etc, can be handled in a way separate from structured cloning. We would instead make these consumables use a "transfer" instead of a "copy". So you would transfer a Response and its body ReadableStream into IDB.

waves hands

GrosSacASac commented 5 years ago

I found a tc39 proposal in the stage 0 list. It looks like it is not active anymore, could someone present it ?

Should it be a tc39 proposal or a whatwg one ?

What are the next steps necessary to have it implemented in browsers ?

domenic commented 5 years ago

This works best as a WHATWG proposal as the WHATWG is the body that specifies structured cloning.

The next steps necessary to have it implemented in browsers are for browsers to determine that it's a high priority on their product roadmap (compared to other things they could spend engineering effort on). That is usually helped by evidence such as web developers advocating for it or showing what they're using instead. However, I think we've already reached a pretty good amount of evidence that this would be useful, so I'm not sure how to make progress on increasing the priority in browser teams' backlogs :(.

rniwa commented 5 years ago

It looks like two use cases being discussed are:

Both of these tweets are about cloning JS objects, not structured cloning, and some of the discussions explicitly mention a "proper" way of cloning JS objects.

I'd be curious to know more concrete use cases, and whatever v0 API being proposed here would satisfy any of them.

surma commented 5 years ago

Yes, I was looking at deep-cloning at the time. Standardizing/exposing structured clone seemed like a low risk and low friction first step. It behaves correct for the majority of use cases (as far as I can tell) and is already specified and implemented in most engines.

The use-cases are mostly related to architectures relying on immutable data structures and chaining-style APIs.

agm1984 commented 5 years ago

I am encountering an issue currently in Vue JS whereby I pass an Object prop from a parent to child component like this:

Also I apologize for showing framework code, but my intent is to help establish context for a need.

export default {
    props: {
        initialValue: {
            type: Object,
            required: false,
            default: () => ({}),
        },
    },

    data() {
        return {
            value: this.initialValue,
        };
    },
};

The initialValue reference is copied to value, which unfortunately copies the Vue getter/setter functions that exist on initialValue.

This means if a person mutates the local state of value, Vue fires the hidden setter functions on the upstream reference and therefore breaks the component encapsulation. In this way, the data flow is not unidirectional, but it is expected behaviour simply because the reference is copied in a non-immutable fashion. Part of this is Vue's problem in my opinion. I think the framework itself should deep clone props as they enter into a component.

This example means you can pass a reference through 100 dimensions of child components and mutate the root component's state in a way that is extremely difficult to trace by visual code analysis. I suspect countless application developers will experience some form of this issue moving forward.

Currently, shallow cloning is not adequate as a solution because nested Objects do not have their child references broken. So in my opinion, the only viable solution is to use Lodash's cloneDeep function or equivalent.

My described issue would be solved using structuredClone like this:

    data() {
        return {
            value: structuredClone(this.initialValue),
        };
    },

I think browsers should natively support deep cloning ASAP, so that a third party cloning dependency can be avoided. This will save bandwidth for all parties by reducing bundle size in every library that packages some form of deep cloning in order to operate immutably.

Personally, I like the idea of following the implementation details from node.js because it will make it easier for both node.js and browsers to benefit from continued innovations from either node.js or browsers with respect to deep cloning and immutable paradigms.

krisdages commented 3 years ago

Does anyone have any suggestions on how to advocate for something like this to the browser vendors?
What if one was to just implement it in Chromium or Firefox and try to submit it to the project?

Or maybe standing outside headquarters chanting with a protest sign? :)

jeremyroman commented 3 years ago

You'd have to follow their respective launch processes to submit it. For Chromium, that's this process.