Define remote object lifecycle

w3c / webdriver-bidi

Bidirectional WebDriver protocol for browser automation

https://w3c.github.io/webdriver-bidi/

375 stars 42 forks source link

Define remote object lifecycle #90

Closed sadym-chromium closed 2 years ago

sadym-chromium commented 3 years ago

To prevent memory leakage, there should be a way to release the received remote object. This can be implemented like in CDP Runtime.releaseObject

jgraham commented 3 years ago

What's the model here? Does serializing automatically create a reference to the serialized object? What's the lifetime of that reference? Does it apply to all object types? The WebDriver/HTTP model (for elements and windows) is that serialization doesn't affect the lifetime of the value at all, so dereferencing the remote object can fail. I can see that model might be more problematic with a pattern like return [{foo: document.body}] where there are a bunch of temporary objects that aren't otherwise kept alive. But at least with the current model when we don't hit serialization limits the contained values will be directly accessible to the client, so the problem is less bad than in CDP.

sadym-chromium commented 3 years ago

I'm implementing the ArrayValue serialisation. I cannot understand what is the lifecycle of the object in case of this being evaluated: return [{foo: 'bar'}]. According to the spec, the result should have an objectId along with the key-values. The question is when the GC can collect the object.

I agree that exposing garbage collector logic releaseObject doesn't seems ergonomic, but I don't see other options of avoiding memory leakage.

jgraham commented 3 years ago

So with return [{foo: "bar"}], in the absence of autoref, the objects would likely be gc'd, making any attempt to use the handle fail. But that wouldn't necessarily be a problem in this case because the object serialization would contain all the same data as the original object, so there would be much less need to access the handles.

That said, I think I'm convinced that we need something that automatically holds a reference to objects. But there are a number of questions that need to be decided:

Which objects so we hold (strong) references to? Anything that's sent over the wire? Just the root objects returned (e.g. the array in the example above, which will then keep the inner object alive)?
Do DOM objects get special handling? It's much less likely that a node will be GC'd than a radom js object since it's typically owned by the DOM tree.
What's the lifetime of the references? It seems likely we don't want to extend the lifetime of the relevant global. So we should probably arrange things so that the object map is destroyed immediately before the global would otherwise be destroyed (e.g. after non-bfcache navigation or when a worker is going to terminate).

foolip commented 3 years ago

I think I'm convinced that we need something that automatically holds a reference to objects. But there are a number of questions that need to be decided:

Sounds great. Good questions!

Which objects so we hold (strong) references to? Anything that's sent over the wire? Just the root objects returned (e.g. the array in the example above, which will then keep the inner object alive)?

I would say that sending an object ID over the wire should create a reference, even in nested scenarios when there are many object IDs. But @sadym-chromium pointed out that CDP has a Runtime.releaseObjectGroup, and we could perhaps do something like that to release a bunch of objects at the same time.

Do DOM objects get special handling? It's much less likely that a node will be GC'd than a radom js object since it's typically owned by the DOM tree.

I'm not sure that we need special handling for DOM objects. It seems simpler all around to treat them the same.

What's the lifetime of the references? It seems likely we don't want to extend the lifetime of the relevant global. So we should probably arrange things so that the object map is destroyed immediately before the global would otherwise be destroyed (e.g. after non-bfcache navigation or when a worker is going to terminate).

@sadym-chromium looked into this and can elaborate, but it sounds like with CDP, navigating away from the page destroys the "agents" and with them the ID map. Having the references survive navigation would be very weird. So object IDs can become unusable without being explicitly released.

css-meeting-bot commented 3 years ago

The Browser Testing and Tools Working Group just discussed Ownership of serialised objects.

The full IRC log of that discussion

<AutomatedTester> topic: Ownership of serialised objects
<AutomatedTester> github: https://github.com/w3c/webdriver-bidi/issues/90
<AutomatedTester> sadym: in the serialization there is an open question about what do we do when we return an object. in CDP we release objects after a define time
<AutomatedTester> ... there is one solution as suggested by jgraham that we keep track of everything until a page navigation is complete
<AutomatedTester> ... or have a way for the client to say that the scope is now complete and release all the objects no longer needed
<AutomatedTester> q?
<jgraham> q+
<AutomatedTester> ack jgraham
<brwalder> q+
<AutomatedTester> jgraham: I think that is clear thatr we need something like this
<AutomatedTester> ... the use case is "return an array containing data that cant be fully serialsed as its too deeply nested" and then it get's GCed and you can't get it
<AutomatedTester> ... and I think that we will need to al.low the users to explicitly clean this up especially for single page apps
<AutomatedTester> ... we don't want to have a situation where webdriver just creates leaks as it's constantly holding onto life time references
<AutomatedTester> ... and it might be good to know what references do we want to hold onto
<AutomatedTester> ... do we just need 1 root object or smaller references of what we have accvessed
<AutomatedTester> sadym: my opinion if we expose something to a user we should give them something to allow gc
<foolip> q?
<AutomatedTester> ... if we try save memory by picking something that does by value and some by reference it wont be an economic solution
<AutomatedTester> ack brwalder
<AutomatedTester> brwalder: on the subject of holding strong refernce on the root object and not on the nested object
<AutomatedTester> ... the main difference between cdp and bidi is that we return a large tree
<AutomatedTester> ... so we can just hold the root object
<AutomatedTester> ... while in cdp you have to go through the nested object and then assigning a key
<AutomatedTester> ... so we can just keep what we have and it should be fine
<jgraham> q+
<AutomatedTester> sadym: that is a good point but there may be a situation where we can't fully serilize and allow users to get the rest
<AutomatedTester> brwalder: I am not sure how that works in cdp
<AutomatedTester> ... but in bidi if we keep the strong reference to the object then it woudlnt be gc'ed
<AutomatedTester> ... and if we can't fully serialise it in the first tree that the client would be able to get it in a 2nd call
<AutomatedTester> ... and we would need to document that in a specific way
<AutomatedTester> q?
<AutomatedTester> ack jgraham
<AutomatedTester> jgraham: the disadvantage with a nested object and hold a reference to the root object
<AutomatedTester> ... and then it something else comes along and updates it then there will be data missing
<foolip> q+
<brwalder> q+
<AutomatedTester> ... if we create it at a nested model it will be hard for clients to work
<AutomatedTester> ... and holding onto the root will be a good first step and easier to implement
<AutomatedTester> sadym: what about a command to explicitly release ?
<bj> q+
<AutomatedTester> jgraham: that makes sense and we need to make sure that they are tied to the lifetime of the window object
<AutomatedTester> ... the session would have weakref to a realm to a map to a strong reference
<AutomatedTester> q?
<AutomatedTester> ack foolip
<AutomatedTester> foolip: it would be nice if we return an id if we keep a reference and if we dont then ther eis no id
<AutomatedTester> ... and cdp does have the concept of object groups
<AutomatedTester> q?
<AutomatedTester> ack brwalder
<foolip> CDP releaseObjectGroup is https://chromedevtools.github.io/devtools-protocol/tot/Runtime/#method-releaseObjectGroup
<AutomatedTester> brwalder: I just wanted to addess jgraham 's comment on just holding a strong reference and something modifies it
<AutomatedTester> ... I am not convinced it a flaw in the spec and it's a general flaw in JS since that is by ref and not by val
<AutomatedTester> ... so it's not really on us to solve this
<AutomatedTester> q?
<AutomatedTester> ack bj
<jgraham> So a case where this could be a problem is where you e.g. log a js value and later expand it and see the value after it was modified not the value at the time it was logged
<AutomatedTester> bj: object groups in cdp and webinspector are done by injected scripts to a page so they disappear when the page navigates
<brwalder> q+
<AutomatedTester> ... so I don't think there much value in adding a system to explicitly clear
<AutomatedTester> ... and we've had it working as normal
<AutomatedTester> q?
<AutomatedTester> brwalder: I think bj 's concern is valid
<bj> btw, objectGroup is a parameter to the main evaluateJavaScript command in WebKit's protocol. Which is how console code is able to do anything sensible with it. Other objects are not in a specific object group.
<AutomatedTester> ... if we introduce something like this to bidi we are going to hope users know what to do to do gc
<AutomatedTester> ... and since we can return a tree we don't need the idea of object groups because the tree is the object group
<jgraham> RRSAgent: make minutes v2
<RRSAgent> I have made the request to generate https://www.w3.org/2021/03/10-webdriver-minutes.html jgraham

whimboo commented 3 years ago

@sadym-chromium looked into this and can elaborate, but it sounds like with CDP, navigating away from the page destroys the "agents" and with them the ID map. Having the references survive navigation would be very weird. So object IDs can become unusable without being explicitly released.

Note that this would be problematic for DOM nodes. After a navigation their IDs should still exist so that it will be possible to check for a stale element. Otherwise we would always throw a no such element error. But maybe your comment was not related to DOM nodes...

jgraham commented 2 years ago

Objects returned over the protocol get an ID, and we have an existing mechanism in the spec to keep a weak map between id and object (and vice-versa) so the object can be looked up by id, and so we return the same ID for the same object. However we don't have any existing mechanism to allow keeping objects sent over the protocol alive so they can be guaranteed to be available later (assuming the underlying global is not discarded; for all that follows take it as read that any cache will not extend the lifetime of the global beyond that which it would have in the absence of the cache i.e. we never want to end up leaking Window objects).

I see several options:

Never create strong references. This is defacto the current behaviour and enough to support WebDriver Classic, which only allows returning DOM Elements and fails if a value has been discarded. It's simple, and with enough effort a user can always implement strong references in library code, at least for the script.execute case (but not for e.g. log.entryAdded). But it feels like it's going to be surprising to users if they do something like return Array.from(document.querySelectorAll("div")), we provide an objectId in the response, and immediately discard the object so the id is never useful.
Always create strong references to anything that's sent over the protocol. This means the least surprising behaviour for users (objectIds they get are always usable as long as the global hasn't been discarded), but runs the strong risk of memory leaks.
Create strong referencs to some objects sent over the protocol e.g. the root object in any serialized object graph, or any object where the children aren't fully serialized, or root JS objects, but never platform objects. This basically amounts to defining some heuristics.
Provide a per-command mechanism to specify which returned values are strongly referenced and which aren't. For example we could have keepAlive: "none" | "root" | "all" on script.execute and similar commands, corresponding to the suggestions above.

Assuming we keep strong references to any elements, there's also the question of how to release them. I can see several options here:

No mechanism. Referenced objects are always kept alive until the global is discarded.
A simple "clear" command that clears all references for a specific global.
A command that releases specified objects.
A CDP-like object group mechanism that allows assigning a per-call group to specific objects, together with a command to release a whole group of objects at a time. But note also that this is likely more important for CDP where objects can only be got one-at-a-time than in BiDi where we support serializing object graphs.

For the mechaisms involving explicit [de]referencing, it's unclear how that should interact with events that serialize objects e.g. log.entryAdded can return remote values in the arguments. If we have e.g. named groups, perhaps those should go into a special named group? Otherwise some kind of additional configuration of the event is needed, and we don't have a mechanism for that right now.

css-meeting-bot commented 2 years ago

The Browser Testing and Tools Working Group just discussed Remote Object Lifecycle.

The full IRC log of that discussion

<AutomatedTester> topic: Remote Object Lifecycle
<AutomatedTester> git
<AutomatedTester> github: https://github.com/w3c/webdriver-bidi/issues/90
<AutomatedTester> jgraham: We now have the ability to execute script and return references to objects that are owned but the ecmascript engine
<AutomatedTester> ... at the moment, the way the spec if writen and you return an array we return a reference to it
<AutomatedTester> ... and if that is GC'ed then when we try access them then this could problems as things could be thrown away
<simonstewart> q+
<AutomatedTester> ... we have can have strong referenced that is associated to the global so that as long as the global exists then that's fine
<AutomatedTester> ... or we have strong reference to root objects and weakmaps for the rest
<AutomatedTester> ... or the CDP way of object groups
<AutomatedTester> ... I have written up a comment on this
<AutomatedTester> ... Please can people read through this and write it up in the issue
<AutomatedTester> q?
<AutomatedTester> ack simonstewart
<AutomatedTester> simonstewart: we need to make sure we think about clients that are written without GC in mind when doing the work here
<jgraham> RRSAgent: make minutes
<RRSAgent> I have made the request to generate https://www.w3.org/2021/12/08-webdriver-minutes.html jgraham
<jgraham> RRSAgent: stop
<jgraham> RRSAgent: bye
<RRSAgent> I see no action items
<jgraham> Zakim, bye
<Zakim> leaving. As of this point the attendees have been jgraham, foolip, brwalder_, sadym, jdescottes, zcorpan, jugglinmike, whimboo, s3ththompson, mzgoddard, jimevans, simonstewart

sadym-chromium commented 2 years ago

IMO, as we want to provide an ergonomic protocol, objectId should always be a strong reference. Otherwise there will be lots of space for mistakes and misunderstandings from the users' side.

Regarding GC, there are few scenarios: Scenario 1: the same object was returned twice. User returns an object as a result of script.evaluate and the very same object was logged by console.log. We have to GC the object only after both of handlers (evaluate and entryAdded) are done. How to avoid releasing objects too early? Scenario 2: I guess there can be a scenario, when user wants to keep only some of the references, and release others. Scenario 3: serialization of a cyclic reference. const a = {}; a.a = a;

In CDP each objectId is unique per serialization, which allows to avoid problem in scenario 1. The object is GC'ed only after all it's references (objectIds) are released.

This suggestion seems to solve the problem:

Make objectId always a strong reference to avoid confusions and provide ergonomics.
Make objectId a unique reference to avoid race condition in Scenario 1. (To provide the proper cyclic references in Scenario 3, probably the same objectId should be used in one serialization, but no strong opinion here).
Add method releaseObject(objectId).
Add method releaseAll() to ease the life of those who are not in Scenario 2.

And additionally:

Consider providing objectId only for the root object, to avoid handling too many objectId's.

jgraham commented 2 years ago

I don't really understand why unique object ids per serialization helps with scenario 1 (I think I don't understand scenario 1 at all; if you've got the same object in a console.log call and an evaluate call, what's the race that you're worried about? Are the handlers here client code?).

I've found the behaviour of CDP providing a unique reference per serialization confusing. IIRC it doesn't prevent comparing two objects for equality (because the format is something like <actual id>.<counter>), but it requires you to know something undocumented about the id format in order to do so. Being able to assume that the id is opaque, and usable for equaliy comparisons, seems much nicer.

I also think that we don't have nearly such a problem as CDP with objects being released early because we are providing full serialization of common containers. In CDP something like evaluate("document.querySelectorAll('.foo')") is a problem because it just returns the id of the container, which will then be GCd before it can be used. In WebDriver-BiDi we can make that serialize as a list with an appropriate type, and so we don't care nearly so much about the lifetime of the JS object representing the collection, because we don't actually need to access it again.

The disadvantage of strong references is that you're forcing people to do manual memory management; if you don't always call release* then the memory is leaked at least until the page navigates (which for a SPA might be "when the session ends"). If you're also proposing keeping a strong reference to objects that are logged in console.log that seems like it has even more potential to unexpectedly leak. Meanwhile it's not that hard to manually implement the strong reference part in JS if you really need it (at least for script execution); basically you allocate a Set in the relevant realm, and then have your scripts add the objects you want to keep alive to that set. You can also have a release function that you call to remove the object from the set, and so let it have a page-managed lifetime. That's a bit ugly for code running directly in content since you need to stash the global somewhere, which makes it theoretically visible to page scripts, but it doesn't seem too bad at all for code running in a sandbox.

Note that the WebDriver classic behaviour for DOM nodes has always been to hold weak references, and return an error when the element has been discarded. Extending that model seems like the most natural thing, and we could add some kind of opt-in to keeping strong references if we see a lot of confusion / user need.

I also remember @burg had some thoughts on this topic.

sadym-chromium commented 2 years ago

Scenario 4: user reuses returned object:

const evaluateResult = bidi.evaluate(()=>{ return {property: "property value"}; });
bidi.callFunctionOn((obj)=>{ return obj.property; }, evaluateResult);

I don't really understand why unique object ids per serialization helps with scenario 1 (I think I don't understand scenario 1 at all; if you've got the same object in a console.log call and an evaluate call, what's the race that you're worried about? Are the handlers here client code?).

In scenario 1 it's not exactly the race condition, but need in keeping the object reference count on the user side. Example:

// 1. Set log listener and release result.
bidi.setOnLogMessageListener((message)=>{
  // make some logic here.
  bidi.release(message);
})

// 2. Evaluate and log the same object.
const evaluateResult = bidi.evaluate(()=>{
  const a = {
    property=123
  }; 
  console.log(a)
  return a;
});

// 3. Use evaluated object in `callFunctionOn`.
bidi.callFunctionOn((obj)=>{
  return obj.property;
}, evaluateResult);

In such a use-case in step 3 user will likely received staleObject exception. De facto we delegate the garbage collecting logic to user. They have to know when an object is safe to release.

I've found the behaviour of CDP providing a unique reference per serialization confusing. IIRC it doesn't prevent comparing two objects for equality (because the format is something like <actual id>.<counter>), but it requires you to know something undocumented about the id format in order to do so. Being able to assume that the id is opaque seems much nicer.

The objectId in CDP is "<realm>.<counter>", where counter is shared in the given realm. So currently there is no way to compare 2 objects without roundtrips.

I also think that we don't have nearly such a problem as CDP with objects being released early because we are providing full serialization of common containers. In CDP something like evaluate("document.querySelectorAll('.foo')") is a problem because it just returns the id of the container, which will then be GCd before it can be used. In WebDriver-BiDi we can make that serialize as a list with an appropriate type, and so we don't care nearly so much about the lifetime of the JS object representing the collection, because we don't actually need to access it again.

The disadvantage of strong references is that you're forcing people to do manual memory management; if you don't always call release* then the memory is leaked at least until the page navigates (which for a SPA might be "when the session ends"). If you're also proposing keeping a strong reference to objects that are logged in console.log that seems like it has even more potential to unexpectedly leak. Meanwhile it's not that hard to manually implement the strong reference part in JS if you really need it (at least for script execution); basically you allocate a Set in the relevant realm, and then have your scripts add the objects you want to keep alive to that set. You can also have a release function that you call to remove the object from the set, and so let it have a page-managed lifetime. That's a bit ugly for code running directly in content since you need to stash the global somewhere, which makes it theoretically visible to page scripts, but it doesn't seem too bad at all for code running in a sandbox.

Note that the WebDriver classic behaviour for DOM nodes has always been to hold weak references, and return an error when the element has been discarded. Extending that model seems like the most natural thing, and we could add some kind of opt-in to keeping strong references if we see a lot of confusion / user need.

As normally DOM nodes are not GC'ed as long as they are on the page, from that perspective, analog in BiDi would be properties of globalThis. For them we can provide stable weak references, as they don't have risk to be GC'ed unless they are removed from the globalThis. Having weak reference delegates the "keeping alive" logic to user.

We have to deal with GC anyway, either implicitly or explicitly. The question is which approach is more ergonomic:

Weak reference with persistent objectId.
- Scenario 4 is not possible without storing returned object on the browser side. It would work only if the result is stored somewhere:
```
const evaluateResult = bidi.evaluate(()=>{ window.something = {property: "property value"}; return window.something; });
bidi.callFunctionOn((obj)=>{ return obj.property; }, evaluateResult);
```
- Force user to worry about keeping objects alive.
- User have to worry about keeping objects alive by keeping them in a persistent set/map.
- De facto GC is delegated to user, as they have to remove not required objects from the persistent map.
- objectId doesn't make any sense, as user can access objects via the map.
Strong reference with reference counter in objectId.
- (For my personal taste) more ergonomic, especially if we provide both "releaseAll" and "releaseObject" methods.
- Scenario 4 is possible.
- Force user to worry about releasing memory.
Weak reference with reference counter in objectId. Doesn't make too much sense.
Strong reference with persistent objectId.
- Force user to worry about releasing memory.
- Scenario 1 requires additional effort.

jgraham commented 2 years ago

Scenario 4: user reuses returned object:

const evaluateResult = bidi.evaluate(()=>{ return {property: "property value"}; });
bidi.callFunctionOn((obj)=>{ return obj.property; }, evaluateResult);

But we serialize the property here, so you don't need to make a roundtrip to get it, unless you opt-in to not serializing the property; you can just do evaluateResult.value.property directly.

This feels like the key difference compared with CDP. In CDP you always need to keep a strong reference to Object return values because otherwise you're returning a handle which is likely to immediately become invalid. In WebDriver-BiDi the fact that the handle may become invalid is less problematic because we're also returning a serialized copy of the data in the common cases. If it was possible to tell that an object's only on the stack we could avoid returning an objectId at all in that case and things would still work out. But we don't have a way to do that. We could just not return an objectId at all for objects that we fully serialize (and with no internal cycles), which is arguably better if we want to optimise for not returning handles that are likely to be invalid. But it's going to produce wrong behaviour in some cases where you really do want a reference to a global object that you can use to perform updates.

In scenario 1 it's not exactly the race condition, but need in keeping the object reference count on the user side. Example:

// 1. Set log listener and release result.
bidi.setOnLogMessageListener((message)=>{
  // make some logic here.
  bidi.release(message);
})

// 2. Evaluate and log the same object.
const evaluateResult = bidi.evaluate(()=>{
  const a = {
    property=123
  }; 
  console.log(a)
  return a;
});

// 3. Use evaluated object in `callFunctionOn`.
bidi.callFunctionOn((obj)=>{
  return obj.property;
}, evaluateResult);

OK, but this seems like a pretty contrived example. It's assuming that there's a release function which the user is for some reason calling from an event handler. Do we have a realistic example from something like the Puppeteer code? The examples I see are pretty much of the form "create an object, use the object for a bit, dispose of the object". Maybe Puppeteer-using scripts are doing something more complex?

As normally DOM nodes are not GC'ed as long as they are on the page, from that perspective, analog in BiDi would be properties of globalThis. For them we can provide stable weak references, as they don't have risk to be GC'ed unless they are removed from the globalThis. Having weak reference delegates the "keeping alive" logic to user.

Although we could have special behaviour for direct properties of globalThis, it feels like it's too much of a special case since if you have e.g. globalThis.foo = {bar: [1,2,3]} and return globalThis.foo.bar then it will create a strong reference to the object event though it's jus as much a global as globalThis.foo.

We have to deal with GC anyway, either implicitly or explicitly. The question is which approach is more ergonomic:
1. Weak reference with persistent `objectId`.
2. Strong reference with reference counter in `objectId`.
3. Weak reference with reference counter in `objectId`. Doesn't make too much sense.
4. Strong reference with persistent `objectId`.

I'm still not sure I understand the reference counter thing. Is it the case with CDP that if you do bidi.evaluate(() => window.prop={}); let objs = new Array(2); objs.map(() => bidi.evaluate(() => window.prop)); bidi.evaluate(() => delete window.prop) and then bidi.release(objs[0]), the browser is still keeping a reference to the object such that bidi.callFunctionOn((obj) => obj, objs[1]) will still work? And the justification for this is basically that object ids are actually many-to-one with underlying js objects, so you have the property that calling release only releases the js object if there are no other handles pointing at it? That sounds like it makes leaks even more likely, but I guess the reason is something like the client could put every handle it's ever received into a giant Set so it's clear which objects have been released on the BiDi side and which haven't, as long as whenever you call release you delete the corresponding handle from the set, whereas with a setup where the same object always ends up with the same handle and the browser end is keeping a Map like {objectId: [obj, refCount]} you need to keep the refcounts in sync to know which objects you haven't freed, and that's more complex? Presumably adding the refcount to the id is designed to avoid that piece of synchronisation, but if we did that why not just put it as a field in the object rather than packing it into the id?

Anyway, my sense is that Option 1 is the simplest thing to start with because, unlike CDP, we serialize common container types and so the obvious use cases like accessing an Array of Elements would work without needing to keep the remote object alive. It also fits better with our design goal of minimising network traffic since users don't need to manually call release for each object that's returned in order to avoid leaking memory, and is more minimal in the sense that we don't need to specify 'release*` methods at all.

In the case that the serialisation doesn't meet users' requirements, and they do want strong references, there's a script-based workaround possible, and if that turns out to be a common pain point there's a clear way to extend to the protocol at least for scripting use cases, by adding some additional parameter indicating the returned object(s) should be kept alive (maybe as part of an object group). It also avoids a conversation about whether we only want to keep the root object alive when returning a nested object, or whether each individual node should be kept alive.

Of course we could do the opposite and make creating only a weak reference opt-in. But I think it would help to justify that by pointing at existing code which has the requirement to create a strong reference to returned objects, even with serialization. The most obvious examples like Puppeteer's queryAll seem like they'd work just fine with just a serialized array-like return value for DOM collections.

sadym-chromium commented 2 years ago

The scenario 4 is what is promised by the Puppeteer API and implemented by JSHandle.

sadym-chromium commented 2 years ago

Please extend the lists or correct me if I missed or misunderstood something:

Strong reference. Main arguments are:

Puppeteer promises: "JSHandle prevents the referenced JavaScript object from garbage collection unless the handle is disposed."
BiDi command evaluate makes an RPC to the page's ECMAScript. ECMAScript function call returns a strong reference. Having RPC logic close to the ECMAScript's seems more natural (Scenario 4).
Not requiring strong reference can lead to flaky and hard-to-debug behaviour with stale objects, especially when test gets into race condition with GC.

Weak reference. Main arguments are:

Avoiding potential memory leaks.
Easier object's comparison (simply by objectId).

The following is just a clarification, what I meant in my previous comments by "reference with counter" and how CDP works:

My bad, In the scenario 1 I assumed we are agreed on having the "strong" decision.

By "reference with counter" I meant the mechanism used in CDP (and puppeteer). As we never know for how long a client need a reference for a given object, client needs release each objectId ever sent to them manually on all together. Every objectId sent to client over the wire is unique, and actual JS instance is released as soon as all it's associated objectIds are released. In the example you just wrote: Scenario 5

bidi.evaluate(() => window.prop={my:'obj'}); // `my_obj` has 1 reverence `window.prop`.
let objs = new Array(2); 
// Create 2 references to `my_obj`:
objs.map(() => bidi.evaluate(() => window.prop)); 
// objs=['my_obj_reference_1', 'my_obj_reference_2']. 
// `my_obj` has 3 references: `window.prop`, `my_obj_reference_1`, `my_obj_reference_2`.
bidi.evaluate(() => delete window.prop) 
bidi.release(objs[0])
// `my_obj` has 1 reference: `my_obj_reference_2`.
bidi.release(objs[1])
// `my_obj` has no references and can be GCd.

I agree there is some space for memory leaks, but having a method releaseAll() should help user avoid releasing objects one-by-one.

jgraham commented 2 years ago

Taking the items somewhat out of order:

Strong reference. Main arguments are:

2. BiDi command `evaluate` makes an RPC to the page's ECMAScript. ECMAScript function call returns a strong reference. Having RPC logic close to the ECMAScript's seems more natural (**Scenario 4**).

I think arguing in terms of what feels more "natural" is hard because it's clear we don't quite agree on that. To me, the fact that the CDP/Puppeteer model introduces C-style memory management (in the sense that you need a manual call to release/free in order to destroy an object) is quite unnatural; it's not what JS programmers have to do in any other situation. Of course it doesn't have the safety hazards of C (really the semantic is disown or something, since it just decreases the refcount and otherwise allows the GC to manage the lifetime), but it's even something that system-level languages are moving away from where possible (e.g. in Rust or Go or even modern C++ you have a system that doesn't require the user to manually free objects).

We can't use those techniques of course, because we have to integrate with the existing GC model of javascript, but across a FFI, which the GC doesn't know about. But I think it means that no option is especially "natural"; in any case the user has to be aware that the abstraction of being able to return an object from a remote process is leaky and deal with the consequences.

3. Not requiring strong reference can lead to flaky and hard-to-debug behaviour with stale objects, especially when test gets into race condition with GC.

This is true, although I wonder how likely you are in practice to ever "win" that race given the IPC overhead and the fact that a generational GC is likely to collect shortlived objects relatively quickly. Nevertheless I agree that it's a real risk.

1. Puppeteer promises: _"JSHandle prevents the referenced JavaScript object from garbage collection unless the handle is [disposed](https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#jshandledispose)."_

I think that the requirement to make the transition to BiDi for existing clients like Puppeteer as easy as possible requires having a mode of operation that preserves CDP semantics here is probably strong. As previously mentioned one could just depend on the clients implementing this themselves in JS, which is certainly possible, but it is at least some extra burden.

Weak reference. Main arguments are:
1. Avoiding potential memory leaks.

I think this is perhaps more of a problem in the BiDi model than in the CDP model.

CDP makes working with multiple objects quite painful. You can get references to objects one at a time, so you pretty much have to explicitly create each JSHandle and although it's not ideal to make you disown them one-by-one it's at least clear that there should be a 1:1 correspondence between calls that create a JSHandle and those that dispose it.

Since BiDi allows returning multiple objects you don't (necessarily) even get this invariant. For example bidi.evaluate(() => {foo:[]}) will return two different objects. If each one gets an object id, and each one gets a strong reference then you need two calls to dispose (i.e. release) to clean up all the objects you allocated.

releaseAll is a solution, but it's a very blunt hammer because it has a non-local effect. If you have a test harness with some module-level fixture that gets a reference to a JS object at the start and expects to be able to use that object later on, it can be broken by any other code that calls releaseAll in the same realm (or whatever the domain of releaseAll is supposed to be; that is I think another problem with it).

I imagine this is why CDP has releaseObjectGroup rather than releaseAll. As long as object group names are unguessable it makes it possible for clients to mass-disown objects in a way that doesn't have non-local effects.

2. Easier object's comparison (simply by `objectId`).

I still think the unique object id per reference is a specific design decision that isn't really strongly coupled to having strong references. It means that if you have an object id and call release(id) you can guarantee that handle isn't valid anymore. But you could equally call the operation deref and have deref(id) decrese the number of BiDi-owned references to the object with id objectId. You could even return the number of remaining references. I suppose that again has a locality problem; if some other piece of code is trying to hold onto an object you shouldn't be able to call release (or deref) in order to cause it to be released. Maybe you could again solve that with object groups, where a reference is shared within a group, but the same object gets a different name in different groups.

So I think I'm convinced that:

We need to have a way to replicate the semantics that Puppeteer and other clients use i.e. some way to ensure that the browser keeps an explicit reference to the root object returned in an evaluate call.
releaseAll is a bad idea. If we end up with a system which generates a lot of strong references we probably need some way to deallocate groups of objects in a way that is protocol efficient and allows client code to reason locally.

What I'm still not sure about:

What the right default is. I think a client written against BiDi from the ground up would be much less likely to require the browser to keep alive remote objects because the API would use the serialized form whereever possible for simplicit and to avoid roundtrips.
How to handle non-root objects e.g. should bidi.evaluate(() => [{}, document.querySelectorAll("div")]) explicitly keep references to the nested objects, or only to the root? Probably the root is enough, but it means that trying to use an objectId for any of the nested objects is prone to the same issues you brought up before. We could just not provide an objectId at all in those cases, but that would prevent you doing something like e.g. returning an array of Arraybuffers, and accessing the Arraybuffers directly rather than via array indicies. But maybe that's OK. We'd also need to think again about how to deal with cycles e.g. whether those get a special cycleId property that is only valid within a given serialization or something else.
Whether we can have per-object objectId (like the current proposal), per-serialization objectId (like CDP) or there's a better way to handle this.
The best approach for releasing remote references, particuarly for large numbers of objects.

sadym-chromium commented 2 years ago

@jgraham WDYT of the following solution?

Provide objectID only for the root object, and make it strong reference.
Provide release(objectId) method.

Non-blockers:

Find a way to release large numbers of remote references.

jgraham commented 2 years ago

Something like that is probably reasonable as a first cut.

How do you propose to handle object cycles? Are you still thinking that the objectId should be unique per serialization or unique per object?

I think I'd also be happier if there was an explicit selection mechanism e.g. script evaluation calls had a field like referenceObject: "root" | "none" (probably a bad name), which would give the option to extend the behaviour in the future e.g. if we wanted to have an option to keep a reference to all objects in the graph, or only keep a weak reference, or similar. My personal preference would be to make the default "none" but if that's controversial I think we could just make it non-optional.

sadym-chromium commented 2 years ago

Cycles: I don't have a good answer yet. Perhaps option 1 can be a starting point:
1. Just ignore the fact there is such a thing, and rely on maxDepth.
2. Follow current spec, but without objectId. Meaning the object met the second time wouldn't have an objectId.
3. Create a strong reference for each cycle.
I find keeping objectId unique per serialization would ease both using and implementing BiDi. From the downside I see only not being able to compare objects on the client side by IDs. WDYT?
referenceObject: "root" | "none" seems to be a great solution for future extension if needed. Default none seems fine.

jgraham commented 2 years ago

For cycles, if we aren't handing out an object id, consider having an internalId or something which would only be valid within that serialization.

If we really want unique object ids per serialization, I think we should rename it objectHandle or something to reflect the fact that it's not actually identifying the object. I don't love it, but I think the argument that if we're keeping a strong reference then you are required to dereference the handles that you created, but shouldn't be able to affect the handles that were created by other parts of the code, is probably important enough to justify that approach.

Another issue I just thought of with the unique handles rather than ids; if you have an object like foo = {bar: {}} and you initially serialize the foo.bar object, and later the foo object, there's no way for the client to tell that the object it gets back when serializing foo contains the object it already knows about. That's a variation of the equality problem, but it shows that not having a real id actually stops us from implementing a client side reflection of an object that's built up over multiple calls without specific knowledge of the object structure.

I suppose the unified solution to all of this is that you have two properties: objectId which is unique per object, held as a weak ref, and can be used for identity, including for cycles, and handle which is strongly held, unique per serialization, and is accepted as the argument to release. Objects can be referenced (e.g. in callFunction) using either handle or objectId. But maybe that's overkill (and perhaps creates a problem if both properties are supplied).

Also bikeshedding slightly, release seems like it's a bit of a misnomer in that you don't actually know if it will be released. Unless we're very attached to reusing the CDP terminology for compat reasons, I'd prefer disown or something that implies that you're giving up ownership of an object, but doesn't imply the object will necessarily be dropped.

Another question: what do you propose for DOM objects? Presumably they always get a sharedId and the lifetime of that is managed by the page? And otherwise they act like a normal object.

jgraham commented 2 years ago

So to make a slightly more concrete proposal with some bikeshedding of the terminology. I don't think this is anything new, just trying to summarise:

Commands that can return JS objects get a own property (name isn't ideal, but is symmetric with disown) that initially takes values "root" or "none". When the value is "root" and an Object is returned, that Object's serialization gets a handle property which is a unique handle to the object. In the brower, this handle is associated with a strong reference to the object, so it remains alive even if all other (js) references are released.

There's also a script.disown({handle}) command that can be used to give up ownership of a given handle. In any case the object handle never extends the lifetime of the realm in which the object was created (i.e. navigation or worker termination always release all the objects that the protocol kept a reference to).

DOM objects work the same except they also have a sharedId property that can be used as a reference across sandboxes. This never extends the lifetime of the DOM node.

Whether we also provide a weakly referenced objectId property that's has a 1:1 mapping with a given js object, but does not affect the lifetime of the object, is still an open question. If we don't do this we add a different method for handling cyclic objects e.g. an internalId property.

jgraham commented 2 years ago

Small update after some discussion in the editorial meeting: this should be script.disown({handles: []} to allow disowning multiple objects in a single call.

bwalderman commented 2 years ago

Whether we also provide a weakly referenced objectId property that's has a 1:1 mapping with a given js object, but does not affect the lifetime of the object, is still an open question. If we don't do this we add a different method for handling cyclic objects e.g. an internalId property.

I think we should continue to assign a weak-referencing objectId to every object in a serialized response (including nested ones), even if the client uses own: "root". It allows us to handle cycles without needing to add a new kind of ID to the model. It would also avoid the need for script.evaluate/script.callFunctionOn to accept a handle in place of an objectId. Every object, including the root object would always have an objectId. The strong-referencing handle property, if requested, would ensure that the objectId remains valid as long as the clients needs it to.

This way, all clients, whether they opt into manual memory management or not, will be able to interact with objects the same way, using objectId. The handle property would be used only for memory management, and not for identity.

foolip commented 2 years ago

@bwalderman I agree that using objectId for both weak references and resolving cycles has a certain simplicity to it, but the implication of having weak references for every object serialized is that we need to put those IDs into a weak map for all objects ever serialized, in case those IDs are later used by the client. That would potentially waste a lot of memory, right?

Can you elaborate on the model you're proposing? Is part of the justification being able to use a (weak) objectId for some nested object to use that object directly in a later call to script.evaluate or script.callFunctionOn? That is indeed not possible without some kind of handle/ID, so that's a tradeoff here. Not sure if it's an important use case?

According to @sadym-chromium it might be tricky to implement weak reference objectIds in CDP, currently we have a weak map in the mapper but serialization/deserialization is moving into CDP now.

jgraham commented 2 years ago

As long as the root object always has some kind of id it's at least in theory possible to get child properties if the underlying objects aren't mutated e.g. if you have {foo: {bar: [1, 2, {}]}} and you want the array object you can do bidi.callFunction(root => root.foo.bar[2], serializedRoot). This has the disadvantage that if the objects were mutated in the meantime you might silently get the wrong object back rather than gettting an error, and there's no real way to tell.

jgraham commented 2 years ago

One bikesheddy thing to consider whilst we look at this: we currently don't use Id in field names much (just here and in the return value for session.new). So arguably objectId should be written object for consistency.

sadym-chromium commented 2 years ago

I like suffix Id, but don't insist on using it.

bwalderman commented 2 years ago

@foolip as far as I could tell, most of the memory leak concerns expressed in this thread seemed to be about the possibility of making all IDs sent over the wire strong references (even nested ones). I'm not suggesting that and I agree this would not be ideal in terms of memory usage. Maintaining a weak map of nested IDs would use memory, but not as much as holding strong references on all of those nested objects.

Expanding a little bit on my proposal, it might look something like this:

The client can choose "root" or "none" reference handling as proposed above.
The backend maintains a weak map of objectId to actual Objects.
When serializing a response, the backend creates weak objectIds as needed, and inserts them into the weak map. All objects in the response have an objectId. If the client requested "root" reference handling, the backend also creates and holds a strong reference on the root object and returns a unique handle property.
objectIds are valid across serialization so they can be used in later calls. If the client chose "root" reference handling, then all objectIds in the response will remain valid and usable in later calls until the client disowns them (assuming the page doesn't invalidate the references by, for example, mutating the object).

This approach would handle cycles, allow the client to use nested objects in later calls, and also allow object comparison on the client side. There is also ergonomics. My concern was about script.evaluate potentially needing to accept 3 different kinds of IDs: sharedId, objectId, and now handle. With this approach, script.evaluate would only need to accept the first 2 kinds.

foolip commented 2 years ago

@bwalderman two different kinds of memory leaks or memory usage issues have been raised. With strong references, the client needs to take care of releasing handles to avoid leaking memory, and arguably that's not very ergonomic/expected in some languages. But my concern is rather with weak references that are valid across serialization, because the size of that weak map will keep growing for all of the objects which haven't been garbage collected. And given that each object is small, it could be a significant overhead.

Can you elaborate on the use cases for doing this, is it for testing scenarios or something more like debugging/inspecting?

jgraham commented 2 years ago

In terms of overhead, I think it helps if objectIds are numbers rather than strings, since just storing the id strings is probably rather wasteful.

I suppose there's a zero-overhead scheme where one uses an existing object handle (e.g. a pointer to the object if you had a non-moving GC or some other kind of internal id in the general case) directly as the serialized id. But that seems potentially difficult to do in a way that's safe, and probably isn't actually possible given real implementation strategies.

To do this in a JS implementation you could get O(N) lookup with a single WeakMap<Object, ObjectId>. I don't know about the implementation details of a WeakMap, but I suppose that requires at least 16 bytes of storage per object, plus the overhead of the hash table, assuming the id is a Number. The obvious implementation of O(1) lookup is Map<ObjectId, WeakRef<Object>>, but you also want to create a FinalizationRegistry so that you can remove the ObjectId from the map when the Object is GC'd.

So I think I agree that in a straightforward JS based implementation you're going to have at least 64 bytes of overhead per living serialized object and maybe more depending on the implementation.

I don't really know how common it is to serialise long-lived objects to decide if this likely to be a problem in practice. Most of the examples we've seen are serialixing short-lived objects, for which all the object ids are unimportant. In the case of DOM objects, which clearly are long-lived, we can forego the objectId and just provide a sharedId which at least avoids duplicating the problem.

On the flip side there are several clear advantages of having an objectId on everything (which I think have all already been mentioned):

It makes it possible to do client side object identity comparisons.
I makes it possible to unambiguously use a non-root serialized object, even if the remote object has been mutated, as long as the desired object hasn't been GC'd (this is a stronger guarantee than anything based on deriving a path from the root to the target object, since that approach depends not only on the target on no mutations that invalidate the path).
It makes it very clear how to handle object cycles in serialization.

(I didn't include the possibility of making objectId the only way to refer to JS objects, since that seems somewhat orthogonal).

So in any case we can't offer very strong guarantees about being able to use inner objects later, because it's always possible they were GC'd. If you want to avoid that you need to structure your script in such a way that you seralize the objects you want to be able to access in a container that you own. If that's already required then maybe it's not a huge deal to use the object-path approach to accessing the nested objects, although it's certainly less ergonomic. But the inability to tell if two serialized objects are actually the same remote object without a round trip does seem rather worrying.

So I don't have a very strong position here; I'm just trying to approximately qunatify the arguments and set them out as I see them. I think the issue of cycles does mean that we want to make a decision here rather than just defering the problem because if we chose to omit objectId for now, and added a serialization-specific property for handling cycles, it might be harder to add objectId if we change our minds and need it in the future (or maybe we can get away with that; we add an objectId property for objects that are involved in cycles, and guarantee that it's valid within a specific serialization, but later if we move to a realm-global objectId then we just make that the id in the case of cycles, and we've extended the domain over which it's valid, but shouldn't break any client code).

css-meeting-bot commented 2 years ago

The Browser Testing and Tools Working Group just discussed Remote object lifecycle.

The full IRC log of that discussion

<jgraham> Topic: Remote object lifecycle
<jgraham> github: https://github.com/w3c/webdriver-bidi/issues/90
<sadym> й+
<foolip> q+
<sadym> q+
<jgraham> ack sadym
<brwalder> sadym: Splitting implementation into several PRs to keep it easy to review. Currently working on ownership logic and resolving cycles.
<jgraham> q+
<brwalder> foolip: We seem to agree on strong reference to root object. Unclear on whether to allow weak reference to nested objects. What are the use cases for these? Is it more useful for debugging than testing?
<brwalder> q+
<jgraham> ack foolip
<jgraham> ack brwalder
<sadym> q
<sadym> q+
<simonstewart> q+
<jgraham> brwalder: My last post on the issue was trying to enumerate all the scenarios we might want to support, including cycles and reaching into deeply held objects. Implementation is trending away from persistent object ids. If we decide that reaching into nested objects in subsequent commands isn't important then there are other things we can do that don't have the same memory usage concerns.
<jgraham> ack
<jgraham> ack jgraham
<brwalder> simonstewart: If we're trending away from persistent remote object IDs, then how will we compare two objects on the client side? We didn't support this in the original webdriver spec. People came up with expensive workarounds.
<brwalder> sadym: In the current proposal, you would need to provide two IDs and do the equality check on the browser side.
<jgraham> ack simonstewart
<jgraham> q+
<foolip> q+
<brwalder> simonstewart: Round trips for object comparision becomes very expensive over the internet and with multiple objects in a large collection.
<brwalder> sadym: What is the scenario for comparing 50+ objects for equality on the client side?
<brwalder> simonstewart: Seeing if a particular element is in a local set. Asserting that two commands return the same object. There is precedent for persistent IDs in the WebDriver spec, i.e. ElementHandles.
<brwalder> sadym: Persistent object IDs is solved for Nodes with sharedId at the moment, but not for general objects.
<brwalder> simonstewart: Goal is to eliminate RPCs whenever possible.
<brwalder> sadym: Motivation for non-persistent IDs: Puppeteer API promises strong references. If we have strong references, there needs to be an explicit API to release the reference. Having persistent strong references seem harder to handle on the client side.
<brwalder> simonstewart: This is the same problem ElementHandles originally had. The solution was to tie the ID to the backend. The reference is invalidated when the page / JS realm goes away.
<jgraham> ack sadym
<jgraham> ack jgraham
<brwalder> jgraham: Agreed that for nodes, we should have the same guarantees as classic webdriver. The ID should be persistent across contexts. If you have a need for multiple object comparisons, you can always pass an array of objects to the remote end to do the comparison. Being able to do these comparisons on the client side would be useful but it is difficult to get consensus on this without motivation scenarios.
<patrickangle_> q+
<brwalder> jgraham: Use the objectId property for serializing cycles in the short term. Consider using this as a persistent ID in the future.
<jgraham> ack foolip
<brwalder> foolip: Nodes and window handles will still have weak references that are useable across calls. Note that there is no current protocol where you can do generic object comparison on the client side. Since no one is currently relying on this, it may not be a priority.
<jgraham> ack patrickangle_
<jgraham> q+
<brwalder> patrickangle_: Seems unnatural for the test client to guarantee lifetime of objects beyond their natural lifetime. Strong references are easier to implement on top of weak references than trying to go the other direction. If every reference is a strong reference, it becomes difficult to reason about subtle lifetime issues.
<jgraham> ack jgraham
<brwalder> jgraham: The current spec design requires client to opt in to strong referencing. For compat with existing clients (i.e. Puppeteer), we need at least some way to create strong references.
<sadym> q+

jgraham commented 2 years ago

So I think the conclusion from the meeting was that not taking weak references to everything, but using objectId only for cycles might be fine as a starting point, and we can upgrade to using objectId for everything later if there's compelling use cases.

There is one remaining concern here which is that under that scheme the default behaviour is to only get a copy of the object. There's no way to both get a reusable handle and also have the lifetime of the objects fully managed by the page (i.e. not take a strong reference which needs to be released). So the question is whether we want some way to get a weak reference just for the root. That unfortunately makes the forward-compatible plan for cycles a bit weirder, because having the objectId property mean "serializable handle to a weak reference" only for the root and "internal id only useful during deserialization" for cycles would be pretty weird. So we'd either need two properties, or to make the reference added to cyclic objects also useable as part of the protocol.

sadym-chromium commented 2 years ago

Fixed in https://github.com/w3c/webdriver-bidi/pull/206