w3c / webdriver-bidi

Bidirectional WebDriver protocol for browser automation
https://w3c.github.io/webdriver-bidi/
353 stars 39 forks source link

Clarify extensibility rules #274

Open jgraham opened 2 years ago

jgraham commented 2 years ago

For compatibility reasons we've had an understanding that we want to allow commands and responses to be extensible. In the CDDL this is handled with boilerplate in the definitions like:

*text => any

Which amounts to ignoring unknown fields.

However we have not been consistent about adding this in all the definitions.

This recently came up in the context of the Target type, which is defined as

RealmTarget = {realm: Realm};

ContextTarget = {
  context: BrowsingContext,
  ?sandbox: text
}

Target = (
  RealmTarget //
  ContextTarget
);

The question was: should it be allowed to have something that matches both the RealmTarget production and the ContextTarget production? In that case the behaviour is well-defined (because the prose says to first check if it matches ContextTarget), but it seems likely that the user made an error in that case and we should report that rather than silently resolving the ambiguity.

So, how should we handle this in general? Extensibility is useful and required in some cases. But it's not clear if we need it for every type.

Technically, the problem here is that we're hoping to rely on structural typing i.e. we distinguish the target types by the fields they have rather than by name. In contrast for commands and responses where we clearly do want extensibility each type is explicitly named, so there's no possibility of confusion.

So that suggests a general rule: if we want to allow a type with multiple variants to be extended by clients we should always provide an explicit name for each varaint. If there's no explicit name we should not allow additional fields in that type.

Does that sound like a reasonable approach that meets our use cases? And if it does, are we happy with Target being non-extensible, or should we add an explicit type field?

sadym-chromium commented 1 year ago

I'm inclined not to throw exception in such a case, or in case of extended values like:

NullValue = {
  type: "null",
  value: "SOME_VALUE"
}

My motivation is the following: do we want to restrict any additional fields to ScriptTarget, or just having realm and context or sandbox together? And the first seems an overkill, and the I'm not sure how the second can be specified.

In general, we can try hard to predict what exactly user wants, and if they made a mistake, but the only way to be almost sure is to restrict all the extensions. I expect our users to know what they do, as long as they chose to send commands directly, instead of using some libraries.

foolip commented 1 year ago

Related: https://github.com/w3c/webdriver-bidi/issues/79

foolip commented 1 year ago

This problem also reminds me strongly of distinguishable dictionaries in Web IDL. There, unions of dictionaries aren't allowed, because type testing would require checking the object properties.

In BiDi, one fix would be to always use different parameters for these target types. But that's not very ergonomic I suspect.

If we use structural typing, the behavior I think would make sense is that exactly one of context or realm needs to be present, and beyond that we don't care about overlap in field names. I think this would have to be written has a helper algorithm.

Would using a type field be an option?

sadym-chromium commented 1 year ago

... but it seems likely that the user made an error in that case and we should report that rather than silently resolving the ambiguity.

Actually, it could be not an error, when user reuses the script.getRealms command result.

jgraham commented 1 year ago

I strongly want to avoid the situation where we have lots of optional parameters where it's required specify at least one (and not all combinations of > 1 make sense).

That actually seems the same as saying "you can specify context or realm, but not both, and then any other fields are fair game, but they might be ignored"; afaik we can't specify a negative condition in CDDL, so there's basically no difference between writing

Target = {
  ?realm: Realm,
  ?context: BrowsingContext,
  ?sandbox: text
  *text => any
}

and writing

RealmTarget = {
  realm: Realm,
  *text => any
}

ContextTarget = {
  context: BrowsingContext,
  ?sandbox: text,
  *text => any
}

Target = RealmTarget // ContextTarget

the only difference is in exactly what text you need to write to post-validate the data after schema validation passes.

Would using a type field be an option?

If people strongly feel that these need to be extensible this would be my preferred option (just always using named types and never structural types). But there is a tradeoff in terms of verboseness in the common case.

sadym-chromium commented 1 year ago

We do have more or less same example with deserialization. If handle is there, all other fields are ignored, even if they don't match the handle object.

css-meeting-bot commented 1 year ago

The Browser Testing and Tools Working Group just discussed Clarify extensibility rules.

The full IRC log of that discussion <jgraham> Topic: Clarify extensibility rules
<jgraham> github: https://github.com/w3c/webdriver-bidi/issues/274
<jgraham> sadym: When we call script.evaluate we can set a target that can be Context or Realm. What do we want to do in the case that a client sends a message that can match both. We have several options. One is to add an explicit type field to the message. Another option is to restrict extensions so that other fields aren't allowed and cause a protocol data. Or we specify a set of rules for which fields win.
<sadym> q+
<jgraham> jgraham: Fundamental questions are do we want extensibility at the level of inner protocol data, and if so are we OK with structural typing plus rules for disambigutaion, or nominal typing which is heavier weight.
<jgraham> simon: For typed lanaguages it helps if the types are unambigious
<jgraham> ack sadym
<jgraham> sadym: As mentioned on the issue, we have the same kind of ambiguity on the handles field. That makes it more ergonomic to just send back whatever you recieved without altering it. In this example, if we just say that "realm" is the winner it could be more ergonomic from that perspective; the user could just use the result they got from the server directly.
<jgraham> q+
<jgraham> jgraham: I agree that for strongly typed languages (e.g. Rust, Java) this ambiguity is going to be a problem. For e.g. JS the opposite is probably true.
<jgraham> sadym: The client is controlling what to send here, so it doesn't run into this ambiguity.
<jgraham> simon: deserializers are written at a lower level, and don't often have the ability to map to one or more types. Relying on fields being present to determine the type feels risky. If you have both FooHandle and BarHandle that both want some 'id' field. Might be better to be explicit.
<sadym> q+
<jgraham> ack jgraham
<jgraham> jgraham: I think that in this case the client will be sending the message, but we'll apply the precedent to browser-originating messages in the future
<jgraham> simon: ALso intermediate nodes might care
<jgraham> sadym: Another option for this case is to require always sending the realm, which you can always get.
<jgraham> jgraham: I don't think we should add extra roundtrips for this case
<jgraham> q?
<jgraham> ack sadym
<jgraham> jgraham: I encourage enveryone to read the issue, since any changes here will be breaking
sadym-chromium commented 1 year ago

Do we want to being able re-using values received from the server, e.g. script.Source or WindowRealmInfo? @jgraham @shs96c @foolip ?

foolip commented 1 year ago

I think it's important that it's possible to use values from one command or event as the argument for another, without requiring them to be massaged into some other shape, in particular removing extra fields that are going to be ignored.

Concretely, the RealmInfo from script.realmCreated and the BrowsingContextInfo from browsingContext.contextCreated should both be possible to use for script.evaluate. Expressing this in a typed language might require some work beyond just transcribing the CDDL, perhaps base types that have just the required fields and nothing else.

I don't think we should return things like this to the client however.

jgraham commented 1 year ago

Another concern is what clients should do. Currently where there's an ambiguous case clients, unlike browsers, have to make up their own rules on how to handle it. For example, if a client gets back something from script.evaluate that has both a result and an exceptionDetails field, what should they do? There's currently nothing in the spec that allows a browser to send both, but if the return type is an enum and each varaint is considered extensible then a client has to deal with the possibility somehow.

On the subject of allowing a client to take a value from an event or command result that's a super-set of the payload for some other command and provide that directly, I worry that kind of reliance on duck typing / type punning is going to make it more complex to extend the protocol in the long run. It means that if we have some set of results and events that return data that's compatible with being used in the parameters to any other set of commands, any future changes to the spec have to maintain that compatibility i.e. we can't add a new field to any of the result/event types that would alter the behaviour of any of the commands that could previously accept that type. So as a concrete example, if people are directly using a script.RealmInfo result of script.getRealmInfo as the value for the target field of script.evaluate, we can never add a type field to the script.Target type, unless it means exactly the same as in the script.RealmInfo type.

css-meeting-bot commented 1 year ago

The Browser Testing and Tools Working Group just discussed Clarify extensibility rules.

The full IRC log of that discussion <jgraham> Topic: Clarify extensibility rules
<jgraham> github: https://github.com/w3c/webdriver-bidi/issues/274
<jgraham> sadym: We didn't reah consensus on this, we should clarify the open issues.
<jgraham> How extensible should BiDi be? Should it be possible to add random fields and still say it meatches the spec? For example when you execute some script you get an object with realm/context/sandbox. This makes the result compatible with two different ways of specifying a target for script execution. If this is accepted it could be confusing for users, but it's useful to be able to return the values we
<jgraham> got from script execution directly.
<jgraham> q?
<sadym> q+
<jgraham> jgraham: I see three options. 1 is to do what we do currently, which could be an interop hazard, particuarly if people are using deserialization libraries that make different choices around ambiguities. 2 is to say that for these type we aren't extensible and need to match the fields of one variant exactly. 3 is to require each variant to have a type field so that we always match on a specific type
<jgraham> name. We are already using approach 1 in tests, but it could be problematic in the future.
<jgraham> ack sadym
<jgraham> sadym: I have two concers about restrictions. I think it's very useful for people to be able to send back what they receive. I already mentioned "channels" in our implementation that allow splitting; we want to do that.
<whimboo> q+
<jgraham> jgraham: Yes, top level commands/events need to be extensible. One problem with allowing extensions of data without a type tag is that names hav to be globally unique: if people are relying on the fact that the result of some command can be sent as the payload to some other command, we can't add conflicting fields at any point in the future. That might not be a big problem, but it is a worry. Type
<jgraham> tagging is much less convenient when writing in an untyped language, but it's at least very safe.
<jgraham> ack whimboo
<jgraham> whimboo: If we have a type here where the client can specify context or realm, how should the browser handle this? We might want to return one or the other?
<JimEvans> q+
<jgraham> jgraham: That doesn't work, it would be on the client to reform the data into the correct type. We could theoretically change the responses so that they always return types packaged in a way that we could firectly extract a field and return it to the server, but that would be a breaking change.
<jgraham> [general looks of concern]
<jgraham> ack JimEvans
<jgraham> JimEvans: As someone working on client bindings, and in strongly typed languages, having the spec be as unambigous as possible in terms of what's returned across the wire is a big plus. As is called out on the issue, having to make a guess between two different types of things if the response happens to contain valid properties for both of multiple return types, the client has to make a judgement call,
<jgraham> and removing that ambiguity would make life easier for us.
<JimEvans> q+
<jgraham> sadym: Is the issue mostly about browsers?
<jgraham> jgraham: Yes, but a browser could in theory use extensibility to return something to a client that's ambiguous and then the client has to guess because there's no spec to tell it how to resolve the ambiguity.
<jgraham> ack JimEvans
<jgraham> JimEvans: The realm-vs-context case is interesting, but more interesting is a script.evaluate response that contains both a result and an exceptionDetails property. How do you actually know what's the intended result. Browser vendors might say "we're never going to do that", but for clients it would be better if the spec made it impossible to do.
<sadym> Agree on the response (server-client message) to be not extendable in non-root level
<sadym> Agree on the response (server-client message) to be not extendable, including the root level
<jgraham> jgraham: exceptionDetails vs result is a top level response. We had a case where we were sending an evaluate response with no result field and so we assumed it was an exception but actually it was a missing field. That obviously shouldn't happen, but having an explicit field to tell us what the type was would have been helpful.
<jgraham> jgraham: Maybe we could look for all the cases where a client has to intorspect fields to determine types and change those, even if we don't do the same for the case where browsers have to introspect
<jgraham> jgraham: I think that's just ScriptEvaluateResult at the moment. Fixing that doesn't solve the problem for browsers, but it would make things unambiguous for clients.
<JimEvans> also script.Target type?
<JimEvans> Ah. Right. Sorry. I withdraw my comment.