webspecs / url

The URL specification
https://specs.webplatform.org/url/webspecs/develop/
Other
21 stars 9 forks source link

What is "result"? #5

Open domenic opened 9 years ago

domenic commented 9 years ago

In sentences like

Initialize result to the value of file-url, non-relative-url, or relative-url depending on which one first matches the input, and then modify result as follows:

What is result? You use JavaScript object-like notation in saying

If query is present in the input, set result.query to this value.

Does that mean result is a JavaScript object? Well, then what if someone created a setter on Object.prototype for "query"? Do you want to be triggering that? What realm does the object exist in? Above it says it's returning a "JSON objection," so I'm guessing JavaScript object? (The term "JSON object" isn't really well defined...)

You may wish to be using something like an ES record instead.

More generally, it might be good to explain what these steps are meant to "return". E.g. it seems like the "url" rule is meant to return some sort of { query, fragment, port } tuple? And take as an argument a string? In general the processing model implied by the railroad diagrams is not very clear.

rubys commented 9 years ago

It is clear to me. :-) But then again, I'm clearly not the target audience.

More seriously: let me describe what is going on, and then together we can figure out what should be added to the spec.

In the reference implementation, each railroad diagram is associated with a JavaScript function. The prose that follows the railroad diagram defines the code that is contained in that function.

result corresponds to a local variable in that function, more precisely var result. In this case, the result variable is to be initialized to the result of one of three function calls, depending on which one matches the input. If you look at those functions, they each return an ES object.

More generally, each function takes a string as input, and attempts to match that string against the grammar rules. If this succeeds, the function is executed and a single value (generally a String or an Object) is returned. Otherwise, Failure is returned.

domenic commented 9 years ago

Yeah, OK, I think we can work together to make this much clearer :).

I'd state the functional nature of each parsing rule much more explicitly, with the inputs and outputs clearly stated in the header. For example, for https://specs.webplatform.org/url/webspecs/develop/#url, here's one strawman idea:


url(input)

Returns: { scheme, scheme-data, username, password, host, port, path, query, fragment }

  1. Let parsed be the result of parsing input according to this railroad diagram: [railroad diagram here].
  2. Let result be the value of parsed.file-url, parsed.non-relative-url, or parsed.relative-url depending on which one first matches the input.
  3. If parsed.query is present, set result.query to parsed.query.
  4. If parsed.fragment is present, set result.fragment to parsed.fragment.
  5. If result.scheme has a default port, and if result.port is equal to that default, then delete the port property from result.
  6. Return result.

As for the data types being manipulated here, I think the best bet is to state that they are ES records. You may or may not want to adopt the relevant typographic conventions, e.g. result.[[Scheme]] instead of result.scheme.

You also need a very clear definition of "Let parsed be the result of parsing input according to the above railroad diagram." I am handwaving that this somehow creates a record with properties { file-url|non-relative-url|relative-url, query?, fragment? } and using terms like "is present" in steps 3/4 and something even more fuzzy in step 2. Would be good to explain that in some detail, and have "parsing" link to the definition.

rubys commented 9 years ago

I've largely adopted the proposal, with the following notable differences:

Once we reach agreement on how the first parsing rule should look like, I'll proceed with the remainder of steps. Note: the Setter rules won't contain a returns: clause.

domenic commented 9 years ago

Looks good, the local variables technique is less confusing than I thought it would be, and is nice for readability.

I put the railroad diagram above the first step instead being included as a part of the first step. My feeling is that the railroad diagram may be referenced by multiple steps.

I don't feel too strongly about this, but I do think putting it explicitly as part of step 1 makes it clearer that you can't use the diagram in isolation, and that it's only step 1 of a multi-step process for the url(input) function.

"the value returned by the function x(input) when the function x is passed some part of the original input during the parsing of that original input according to the given railroad diagram"

This sounds ambiguous because of the "some part of the original input," although upon re-reading a few times I think I understand what you mean. Is there a better phrasing perhaps?

I haven't adopted the suggestion to use ES6 records. The standard is meant to be language agnostic. I wouldn't have a problem with an informative note, however.

Well, you need something to define what these object-like things you are returning and manipulating are. I suggested ES records because they're a purely abstract (and thus language-agnostic) concept. But you could copy the verbiage they use to define your own "record" type; that would also allow you to avoid the unfortunate typographical conventions ES records bring along.

Right now the main characteristics of these things (provisionally called records) are:

rubys commented 9 years ago

Looks good, the local variables technique is less confusing than I thought it would be, and is nice for readability.

Thanks!

I put the railroad diagram above the first step instead being included as a part of the first step. My feeling is that the railroad diagram may be referenced by multiple steps.

I don't feel too strongly about this, but I do think putting it explicitly as part of step 1 makes it clearer that you can't use the diagram in isolation, and that it's only step 1 of a multi-step process for the url(input) function.

I've now worked through the second section (file-url). Note that step 3 references the railroad diagram. As do 3.1.1 and all of the other steps that reference local variables. Are you OK with this, or do you have any further suggestions?

"the value returned by the function x(input) when the function x is passed some part of the original input during the parsing of that original input according to the given railroad diagram"

This sounds ambiguous because of the "some part of the original input," although upon re-reading a few times I think I understand what you mean. Is there a better phrasing perhaps?

That is indeed poorly phrased but the best I have come up with so for. Perhaps I will come up with something better eventually. Meanwhile, suggestions are encouraged.

I haven't adopted the suggestion to use ES6 records. The standard is meant to be language agnostic. I wouldn't have a problem with an informative note, however.

Well, you need something to define what these object-like things you are returning and manipulating are. I suggested ES records because they're a purely abstract

Good point. Will look into.

domenic commented 9 years ago

I've now worked through the second section (file-url). Note that step 3 references the railroad diagram. As do 3.1.1 and all of the other steps that reference local variables. Are you OK with this, or do you have any further suggestions?

Seems reasonable. To make sure I understand, the idea here is that if the top production is taken, you do steps 3.1; if the middle, 3.2, etc.?

One other thing:

file-url(input) returns { scheme, host, path }. That means that url(input), if it takes the file-url path, will end up returning { scheme, host, path, query, fragment }. That seems in conflict with the description of url(input) as returning { scheme, scheme-data, username, password, host, port, path, query, fragment }.

rubys commented 9 years ago

Seems reasonable. To make sure I understand, the idea here is that if the top production is taken, you do steps 3.1; if the middle, 3.2, etc.?

Yes. Let me know if you have any suggestions on how to make this more clear.

file-url(input) returns { scheme, host, path }. That means that url(input), if it takes the file-url path, will end up returning { scheme, host, path, query, fragment }. That seems in conflict with the description of url(input) as returning { scheme, scheme-data, username, password, host, port, path, query, fragment }.

Different paths will indeed return different results. For some paths, the values of some of these values will remain undefined.

With the possibility of backtracking, I don't want to have sub-rules making direct updates; but I could make the top rule do so. Meanwhile, my reference implementation uses standard ES5 objects, and the getters deal with mapping undefined to strings.

domenic commented 9 years ago

Different paths will indeed return different results. For some paths, the values of some of these values will remain undefined.

It seems good to explicitly specify this somehow, so that the return type is consistent and matches the one in the header.

So either explicitly set/return undefineds (or is it empty string? unclear) where necessary, or have some kind of rule in the prelude saying "if a field appears in the return statement but the steps don't specify its value, it is undefined/empty string."

rubys commented 9 years ago

So either explicitly set/return undefineds (or is it empty string? unclear) where necessary

It's undefined, not empty string. At the moment, the constructor part of the spec hasn't been updated. What the reference implementation does:

In case this isn't clear, look at the constructor at the top of url.js from the reference implementation. The underscore prefixes to property names is an implementation detail (i.e., not intended to be a part of the spec), and this is to differentiate between the properties themselves and the getters and setters.

In some cases, there is a difference between null and empty string. This actually is from Anne's original spec, and retained by my merged spec.

domenic commented 9 years ago

That all makes sense. To be clear, my main concern is that I think it would be ideal for clarity and understandability if each function had a consistent return type.