whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
526 stars 136 forks source link

Is an URL’s path a list of strings or a single string? #33

Closed SimonSapin closed 9 years ago

SimonSapin commented 9 years ago

A URL’s path is a list of zero or more ASCII string holding data, usually identifying a location in hierarchical form. It is initially the empty list.

Sounds good.

(Just to name the things in the list, this could be "[…] a list of zero or more <a>path components</a> holding […] A <dfn>path component</dfn> is an ASCII string.")

An absolute URL must be a scheme, followed by ":", followed by either a scheme-relative URL, or if URL is not special, a path, optionally followed by "?" and a query.

Here, it looks like a path is a single string that is concatenated with other strings. "a path" here probably should be something like "a path as components separated with /." Also, should there be an initial / before the first component?

A scheme-relative URL must be "//", followed by a host, optionally followed by ":" and a port, optionally followed by a path that starts with "/".

Same here. Are components separated by /? What does it mean for a list of string to start with "/", is that the value of the first component?

A path must be zero or more URL units, excluding "?".

URL units being code points, this sounds like a path is a single string.

Set url’s object to a structured clone of the entry in the blob URL store corresponding to the first string in url’s path. [HTML]

… and a list of strings again. (Same in various places in the parser.)

annevk commented 9 years ago

The "URL writing" section describes how you write a component. It wouldn't make sense to refer to a data structure there, since there isn't any yet. It's about the eventual input to the URL parser.

SimonSapin commented 9 years ago

If for the purpose of that section "a URL’s path" is a different concept than in the rest of the spec, it should not link to #concept-url-path.

annevk commented 9 years ago

I guess that would require a whole set of fresh identifiers then... Since none of them are model components... They're all syntax components. Meh.

SimonSapin commented 9 years ago

Another option is to have the path be a single string everywhere. Path components are not actually used outside the parser as far as I know, and could still be obtained by splitting on /.

annevk commented 9 years ago

That would not solve this problem. E.g. IPv4 address is a 32-bit integer, but that's not how you write it. If we make port a 16-bit integer, it likewise doesn't represent syntax. And even port being a string you could argue that the syntax thing is different since it can have leading 0s and such.

SimonSapin commented 9 years ago

There is a "Host writing" section that describes how to represent an IPv4 address as a string with some .s. Should there be a similar section (or just a sentence) for an URL’s path with /s?

SimonSapin commented 9 years ago

(Looking a bit more at the spec…) Namely, I think this sentence:

A path must be zero or more URL units, excluding "?".

should mention path components and slashes.

annevk commented 9 years ago

Since fixing this is not happening today this is what I want to do when I get back to this, hopefully soon:

domenic commented 9 years ago

Consider whether or not Windows drive letters need to be a parse error or part of the URL syntax section. Likely the former? Although that kind of obsoletes Windows from the perspective of the specification...

Yes, let's not do the former, please. Remember that UAs are used ~95% of the time on Windows, even if developers prefer other OSs.

sideshowbarker commented 9 years ago

The planned changes outlined in https://github.com/whatwg/url/issues/33#issuecomment-131400647 look great to me. The one other somewhat-related thing I’m still hoping for are normative requirements for what code points are allowed in a domain, as raised at https://www.w3.org/Bugs/Public/show_bug.cgi?id=25334

annevk commented 9 years ago

@domenic I kind of wish we could fade out file URLs entirely. But I guess they still have legitimate use in node.js (or Node.js?) development?

It's a bit tricky too to define the syntax constructs for them since it heavily depends on the base URL, but I'll try to figure something out.

masinter commented 9 years ago

https://www.ietf.org/mail-archive/web/apps-discuss/current/msg14575.html

a proposed updated IETF spec for 'file:' URI scheme, check it out. r

annevk commented 9 years ago

@masinter we did, see https://github.com/w3ctag/spec-reviews/issues/59.

domenic commented 9 years ago

They have legit uses in pretty much any system which deals with both files and URLs, yeah. Getting them documented and nailed down would be very helpful, especially if the URL Standard wants to be more than just the standard for browsers, but instead the standard for anything that interoperates with browsers.

annevk commented 9 years ago

I've decided to address railroad diagrams separately. See #67.