w3c / web-share-target

Web API proposal for receiving shared data
https://w3c.github.io/web-share-target/
Other
191 stars 20 forks source link

Why invent a new URL template syntax? #31

Closed annevk closed 6 years ago

annevk commented 6 years ago

Can't you reuse the processing model we have for registerProtocolHandler()? Maybe even abstract that and make it reusable?

mgiuca commented 6 years ago

Good question. Really it's two questions: a) what should the syntax be, and b) what should the processing model (inside the spec language) be?

Syntax

The reason we aren't using the "%s" placeholder syntax from registerProtocolHandler is simply because we need multiple named placeholders.

We can't just use positional placeholders because the fields {title, text, url} have no strong ordering, are all optional, and we may want to expand them in the future with further optional values. So we need a named placeholder syntax.

Options we considered are:

Given that you provide a name, Python's "s" seems redundant. So we originally settled on the "%{}" syntax, but then we realised that because '{' is an illegal character in URLs, the '%' is redundant, so we just dropped the '%'.

Regardless of the specifics, we can't just use the simple "%s" from rPH.

Processing model

Given that we have a different syntax to rPH, it's a bit hard to share a processing model. Unfortunately, there is extra complexity with WST compared to rPH, due to:

So, I don't think we can precisely share syntax or model with rPH, but we do have leeway to change the syntax.

annevk commented 6 years ago

I think you should at least use the ugly syntax as otherwise you cannot represent actual braces in these URLs. (Or you'd have to modify the URL parser to support templated URLs. Not sure we'd want to go that far though.)

mgiuca commented 6 years ago

I think you should at least use the ugly syntax as otherwise you cannot represent actual braces in these URLs.

There is no such thing as a brace in a URL.

Or do you mean, we won't be able to support URLs with %7b and %7d in them, because running the parser on the template will convert the template braces into %7b and %7d, and thus we won't be able to distinguish them at substitution time? I share that concern, but...

(Or you'd have to modify the URL parser to support templated URLs. Not sure we'd want to go that far though.)

I don't think it's a great policy to choose a syntax which we both consider "the ugly one" over a cleaner and equally unambiguous syntax, just because it simplifies the implementation. I could be wrong though.

I also agree we don't want to do a major refactor of the URL parser to support this case.

annevk commented 6 years ago

Yes, you won't be able to distinguish at substitution time. At least for me that qualifies as much more substantive than a concern.

mgiuca commented 6 years ago

Sorry, I wasn't clear.

I'm not suggesting that the behaviour be that %7b and %7d are treated as template substitution fields. That would just be broken. In other words, the following test case needs to work correctly:

"share_target": {"url_template": "?foo=%7btitle%7d&text={text}"}

If invoked with text = "abc", this should navigate to "?foo=%7btitle%7d&text=abc" --- "%7btitle%7d" should not be substituted.

So I'm not suggesting that we compromise the semantics. Rather, it's possible to implement the above behaviour correctly if we are careful not to percent-escape the braces before substitution. The current spec text doesn't have this problem, but it does have a problem of not detecting syntax errors in the template at parse time. I have a fix for this but it's a bit of a hack.

It seems that fixing #25 without breaking percent-escaped braces is complicated, but certainly do-able. Or, we could change the syntax. I was saying above that it seems wrong to compromise on the preferred syntax just because it makes the implementation messier. But perhaps that is a trade-off we are willing to make.

annevk commented 6 years ago

As I said before, the only way to properly make that distinction is by branching the URL parser. I'm not sure that's a good idea.

mgiuca commented 6 years ago

@plinss raised on w3ctag/design-reviews#221 the existence of RFC 6570, an existing specification for URL template substitution.

I raised a number of issues with directly referencing this RFC (in particular, it seems largely incompatible with URL Standard), but the existence of this standard, which uses the exact same template notation as I currently do ("{text}") makes me think we should try to keep this syntax rather than switching to "%(text)"; the only reason to do that being to make parsing easier.

After more carefully reading URL Standard and experimenting, I think we can actually go ahead with the existing syntax, if we are also going to remove expansion in the path (which I think we want to do, for now, due to #30). Because '{' and '}' are actually allowed characters in the query and fragment (they are not in the query or fragment percent-encode set):

> new URL('http://{host}/{path}?{key}={value}#{fragment}').href
"http://%7Bhost%7D/%7Bpath%7D?{key}={value}#{fragment}"

So we could specify this as just run the URL Parser, against the manifest base, then in addition, check that braces are balanced in the query and fragment parts. At launch time, run "replace placeholders" on the query and fragment parts. '{}' are thus treated as variable delimiters, while '%7b' and '%7d' are literal brace characters. The only thing is that it would not be possible for an unquoted '{' or '}' to appear in the final URL (which is something that is allowed by the URL Standard).

I'm not really sure why '{' and '}' are allowed to be left unquoted in URL Standard, since in RFC 3986, they were prohibited in all parts of the URL.

Now, this approach would prevent us from allowing placeholders to appear in the path at a later time. If we did want to do that, we could monkey-patch the URL Standard, by saying (from the Web Share Target spec), "Run the URL Parser, but with U+007B ({) and U+007D (}) removed from the path percent-escape set." And then apply the above technique.

annevk commented 6 years ago

we could monkey-patch the URL Standard

Please don't. Detail the complexity there.

I'm still not a big fan of using a different syntax from the established precedent.

domenic commented 6 years ago

FWIW I don't think the %s in registerProtocolHandler is a very strong precedent, being a two-browser API with no ability for multiple named placeholders.

But yes, better to update the URL Standard with some special "URL template parsing mode" or similar, than to monkey-patch it.

annevk commented 6 years ago

I think the other problem with partially adopting some syntax from elsewhere is developer confusion as it seems unlikely we'll be able to adopt RFC 6570 wholesale. So what looks familiar cannot actually be processed with the same kind of tooling.

mgiuca commented 6 years ago

I doubt that developers will be confused upon encountering a curly brace syntax ("{text}") that is slightly different to that of RFC 6570. For one thing, I doubt many developers are even aware of this RFC (I wasn't). I originally based the syntax on a subset of Python's string format syntax. I don't think anyone is going to be confused due to RFC 6570 if we say, "put your placeholder in curly braces like this: /foo?bar={text}" (without mentioning that RFC).

But yes, better to update the URL Standard with some special "URL template parsing mode" or similar, than to monkey-patch it.

Yes absolutely. I think while WST is a draft spec, it makes sense to express this as a monkey patch, but as we prepare to properly standardize it, we would migrate those changes into the URL standard.

Having said that, no such change to URL standard is required if we don't allow placeholders in the path, because '{' and '}' are valid characters in the query and fragment. And I don't intend to allow placeholders in the path, due to #30.

mgiuca commented 6 years ago

Spec has been updated to remove the URL template (for unrelated reasons); we now construct a URL from individual key/value pairs, as per form submission.