Closed DemiMarie closed 6 years ago
As noted at https://url.spec.whatwg.org/#concept-ipv6 this is intentionally omitted per https://www.w3.org/Bugs/Public/show_bug.cgi?id=27234#c2.
Does that mean that users are expected to use a proxy for such situations? Most users do not have such a proxy installed.
On Mon, Jun 4, 2018, 12:19 AM Anne van Kesteren notifications@github.com wrote:
As noted at https://url.spec.whatwg.org/#concept-ipv6 this is intentionally omitted per https://www.w3.org/Bugs/Public/show_bug.cgi?id=27234#c2.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/whatwg/url/issues/392#issuecomment-394230284, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWB-oQ7hfKNXsBMDK_DInSeZM4efoVks5t5LVigaJpZM4UYPOB .
They have to find an alternative, yes.
What about requiring browsers to prompt the user for an interface whenever an address in the fe80::/10 block is entered into the URL?
On Thu, Jun 7, 2018, 11:22 PM Anne van Kesteren notifications@github.com wrote:
They have to find an alternative, yes.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/whatwg/url/issues/392#issuecomment-395634498, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWB4-MGN77dhpVZNwFpjthSnlVfVEWks5t6e4GgaJpZM4UYPOB .
What would the prompt say? (FWIW, I doubt any browser would find that acceptable, and it adds a lot of complexity as we'd have to handle the syntax everywhere, which would lead to tons of issues.)
“Please select a network that your computer is connected to” with a pop-up listing of all interfaces.
That said, considering the super-niche nature of this, I feel like one approach that might work would be to allow users to pass the interface as a command line argument.
On Fri, Jun 8, 2018, 11:28 PM Anne van Kesteren notifications@github.com wrote:
What would the prompt say? (FWIW, I doubt any browser would find that acceptable, and it adds a lot of complexity as we'd have to handle the syntax everywhere, which would lead to tons of issues.)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/whatwg/url/issues/392#issuecomment-395936316, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWBxrzlGgcDdQLRL-eW47Fr4otrkIbks5t60D6gaJpZM4UYPOB .
With reference to https://bugzilla.mozilla.org/show_bug.cgi?id=700999 I request this bug to be re-opened. I will re-iterate the problem that is caused by the lack of link local address support in here for completeness:
I've subscribed to this bug some time ago, because Firefox is essentially preventing to configure newer network devices. I'll elaborate shortly and I ask for clarification from the Mozilla team how to handle this:
Given this background, we often have the situation of devices coming back from customers with any kind of IP configuration. The only plausible way to find the device is using above link local discovery. The IPv4 addresses are often unknown or undiscoverable.
So we are basically in the situation that the link local address is the only reliable address that can be used to configure a whole class of devices.
I often hear the argument that with link-local addresses it would be possible to do Javascript based LAN scanning. I am not denying that, however the same is true for IPv4 - you can easily scan 192.168.x.y/24.
Even with this inconsistent security claim in mind, I ask the mozilla developers to at least include support for link local with something on the line of about:config->ipv6-allow-link-local: [false,true] to not stop all network engineers from working.
Can we reopen this bug or create a new one for realising this?
Update: as this bug is cross-referencing other resources in each and every bug report, I tried to summarise it on https://ungleich.ch/u/blog/ipv6-link-local-support-in-browsers/
This needs to be reopened because firefox is citing this bug as blocking its own fix of the issue.
This issue is holding up the entire IPv6 project, in case that isn't clear.
There's a lot of game playing around these issues because a lot of money is at stake when it comes to who controls domain names, and this is a way to break domain name systems that compete with ICANN & CAs.
So the developers of browsers are pretending that this is complicated when it isn't.
It's actually very simple and there's exactly one way to implement it which is very obvious. But implementing it that way is going to step on some toes of people who don't want IPv6 global connectivity and a new p2p internet foundation to happen. It goes against a lot of business models.
By the way. It's a bug to ever strip the zone identifier before sending it from the client.
The way that HTTP works is, the client sends its own name for the server to the server itself. The server never uses this name to establish IP connectivity. The server can then send the name back to the client in links, who can use it to re-establish IP connectivity by clicking a link, or bookmarking it and opening the bookmark, for example.
Since the server NEVER uses the name to establish IP connectivity, but only sends it to the client; and since the client MAY use the name to establish IP connectivity, therefore: the name MUST be the name that establishes IP connectivity on the client's system.
To try and unblock this issue, we've posted a draft update to RFC6874 and discussion is open. Details at https://mailarchive.ietf.org/arch/msg/ipv6/i5LUQN9vU9MryNWtvS_M_O7Wgjc/. The draft itself is here.
Much appreciated, @becarpenter !
That doesn't fix the percent encoding.
@afcady, we are stuck with % meaning two things. The discussion on ipv6@ietf.org is tending towards requiring only the %25eth0 escape encoding and dumping the suggestion to allow %eth0 heuristically.
The draft has been updated again, following discussion at the recent IETF meeting.
As always, a diff from the previous version is available.
Input on two open issues is needed from implementers!
Here are a few thoughts I have on this, without indicating support or opposition:
inet_pton seems to be ok with "fe80::abcd%25eth1" and "fe80::abcd%eth1" but not "fe80::abcd-eth1". NSURL seems to be ok with "http://[fe80::abcd%25eth1]/" and "http://[fe80::abcd-eth1]/" but not "http://[fe80::abcd%eth1]/". "fe80::abcd%25eth1" seems to be the most parsable of those examples in my sample of 2 IPv6 host parsers. I'm concerned that if we decide to use "%25" as the delimiter to indicate the beginning of a zone id, some software will interpret "25eth1" to be the zone id and some will interpret "eth1" to be the zone id. All browsers currently fail to parse all of those examples. It is clear that software will need to change if we decide to support this. If compatibility weren't a concern, I think it would be nicest to introduce a new delimiter such as '-'.
I'm curious how someone would get a zone id to use. Some systems might use "eth1" as a meaningful zone id, while other systems might use "en1" or "1". If this is the case, it makes me question the uniformity of these URIs.
Your document says "However, the IPv6 Scoped Address Architecture specification gives no precise definition of the character set allowed in
Windows UNC paths apparently use ‘s’ to delimit a zone ID.
No idea why they chose ‘s’ but it doesn’t have the same problems that ‘%’ does in a URL context, so maybe that’s also worth considering.
@karwa Isn't the "s" only used in the context of a domain name? The referenced example on wikipedia says
fe80--1ff-fe23-4567-890as3.ipv6-literal.net
, which is using Microsoft's ipv6-literal.net domain.
ping6 interprets "fe80::abcd%25en0" to have a zone id of 25en0, so the current proposal isn't compatible with that
ping6 interprets "fe80::abcd%25en0" to have a zone id of 25en0, so the current proposal isn't compatible with that
That becomes fe80::abcd%en0
after URL decoding.
It's well understood that percent-encoding makes pure cut and paste impossible. Operations people can deal with that if they must.
Regards, Brian Carpenter (via tiny screen & keyboard)
On Thu, 26 Aug 2021, 18:39 Demi Marie Obenour, @.***> wrote:
ping6 interprets "fe80::abcd%25en0" to have a zone id of 25en0, so the current proposal isn't compatible with that
That becomes fe80::abcd%en0 after URL decoding.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/whatwg/url/issues/392#issuecomment-906137170, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMKET3LMZYI3R3V37CYOJTT6XOTJANCNFSM4FDA6OAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
There's a new draft at https://www.ietf.org/archive/id/draft-carpenter-6man-rfc6874bis-03.html.
The IETF 6MAN WG has just formally adopted our document draft-ietf-6man-rfc6874bis-00. All we need are developers who understand all the places where URLs are parsed (there are probably several) and where the actual socket calls are made. I'm glad to help if developers contact me.
BTW, this bug was closed in June 2018 based on arguments at https://www.w3.org/Bugs/Public/show_bug.cgi?id=27234#c2. Those arguments were against various features of RFC6874. The new draft is quite different and (if published as an RFC) will remove all those annoying features.
New version of the draft published: https://www.ietf.org/archive/id/draft-ietf-6man-rfc6874bis-01.html. Among other things it adds an interesting Microsoft Windows 10 use case.
Just noting that https://www.ietf.org/archive/id/draft-ietf-6man-rfc6874bis-02.html came out a while ago and is now in review by the appropriate IETF Area Director.
w.r.t. some of the comments above, getting rid of the percent-encoding seems to make the parsing issues quite a bit less thorny, but we really need implementers to look at that question.
The relevant URI syntax update is now in IETF Last Call, i.e. the last opportunity for public comments: https://mailarchive.ietf.org/arch/msg/ietf-announce/BqBF9qvZ8qZR4ZPlawPvQSe0WbU/
Worth mentioning that the draft has been updated following Last Call comments: https://www.ietf.org/archive/id/draft-ietf-6man-rfc6874bis-03.html
Another very minor update: https://www.ietf.org/archive/id/draft-ietf-6man-rfc6874bis-04.html
RFC-5952 mentions some of the problems that arise due to the flexibility of textual IPv6 addresses and the benefits of having a single, canonical textual representation. Given that zone IDs are opaque ASCII strings, I guess that no normalization can be applied to them, correct?
In other words, [::1234%EN0]
and [::1234%en0]
must considered distinct addresses, and URLs containing those addresses must also be considered distinct. This also means that the hostname in general would become case-sensitive, contrary to RFC-3986:
The host subcomponent of authority is identified by an IP literal encapsulated within square brackets, an IPv4 address in dotted- decimal form, or a registered name. The host subcomponent is case- insensitive.
https://www.rfc-editor.org/rfc/rfc3986.html#section-3.2.2
When a URI uses components of the generic syntax, the component syntax equivalence rules always apply; namely, that the scheme and host are case-insensitive and therefore should be normalized to lowercase. For example, the URI HTTP://www.EXAMPLE.com/ is equivalent to http://www.example.com/.
Good catch. RFC4007 says nothing about case (in fact, it says nothing useful about the Zone ID string at all). Running code (a.k.a. ping on Linux) tells me that implementations are case-sensitive, which is of course the implication of saying nothing. That sentence "The host subcomponent is case-insensitive." is tricky. It's appropriate when applied to a plain IPv6 address, since the hexadecimal characters are indeed case-insensitive anyway. It's inappropriate when applied to a Zone ID string. I think we'll have to live with it, though, and restrict the format to lower case Zone IDs. I'll take this question to the IETF WG list.
Good catch. RFC4007 says nothing about case (in fact, it says nothing useful about the Zone ID string at all). Running code (a.k.a. ping on Linux) tells me that implementations are case-sensitive, which is of course the implication of saying nothing. That sentence "The host subcomponent is case-insensitive." is tricky. It's appropriate when applied to a plain IPv6 address, since the hexadecimal characters are indeed case-insensitive anyway. It's inappropriate when applied to a Zone ID string. I think we'll have to live with it, though, and restrict the format to lower case Zone IDs. I'll take this question to the IETF WG list.
I recommend having the zone ID be case-sensitive, to reflect what current implementations do.
I don't know how to do that without causing a major problem for the URI parsers in every browser.
I don't know how to do that without causing a major problem for the URI parsers in every browser.
Why would this cause such a problem?
I don't think the problem is with browsers specifically.
The issue is that the new RFC is defined in terms of RFC-3986 and updates it, but 3986 makes quite a broad promise of hosts being case-insensitive. It does not even restrict this to certain kinds of hosts - it just says "the host subcomponent". So it's extremely broad, and there may be applications which rely on that.
For example, imagine I have some sort of application-level cache - I can treat requests to SOMEHOST
as being equivalent to a request to somehost
. Given the language in 3986, I don't even need to figure out what kind of host is being referred to (whether it's an IP address or registered name) - I can just lowercase everything, and that's fine. Maybe I won't catch all requests to the same IP address, but I won't produce false positives, where I say 2 different hosts are equivalent.
This new RFC would make an incompatible change to 3986, by taking away that promise and saying that some hosts may actually be case-sensitive, and that if you just lowercase them as was previously allowed, you might be meaningfully altering which host is being referred to.
The WHATWG URL standard would actually be more accommodating of case-sensitive elements within IP literals than RFC-3986, because we don't make the same broad guarantee. In the WHATWG model, the parser takes a string and creates a URL record from it, and that URL record can contain a host (which is also a record, containing the parsed IP address value). The URL serialiser produces the canonical textual form of that URL record, so nobody needs to do things like manually lowercasing hostnames, and nowhere in the standard does it recommend that anybody does so themselves; the output is already normalised to the extent the standard defines things to be equivalent:
Parsing any of:
http://[::ABCD]/
http://[::abcd]/
http://[::0:0:0:ABCD]/
http://[::0.0.171.205]/
All produce the same result:
http://[::abcd]/
Thanks @karwa. I agree that any URI parser or decoder would have this problem, not just those in browsers. As soon as they have separated out the host part of the URI, any programmer will normalise the whole thing to lower case before analysing whether it's example.com, 1.2.3.4, [::abcd] or [::abcd%upper]. @DemiMarie is correct, I guess, that theoretically every parser could be hacked around to defer the normalisation but that is a big ask, whereas (from my experience with patching wget) the change as defined in the draft is quite straightforward.
Just to confirm, for the case of wget (patched to support RFC6874bis), if I do
wget http://[FE80::3e2a:fdff:fea4:dde7%WLP2S0]
it responds
Connecting to fe80::3e2a:fdff:fea4:dde7%wlp2s0|fe80::3e2a:fdff:fea4:dde7|:80... connected.
In other words, wget normalises the host component to lower case, as expected.
(The patch to wget is at https://github.com/becarpenter/wget6)
New version of the draft today : https://www.ietf.org/archive/id/draft-ietf-6man-rfc6874bis-05.html There's one change, adding a note that zone IDs with upper case letters won't work. (That's an issue we can't fix, due to a shortfall in RFC4007.)
New version of the draft today : https://www.ietf.org/archive/id/draft-ietf-6man-rfc6874bis-05.html There's one change, adding a note that zone IDs with upper case letters won't work. (That's an issue we can't fix, due to a shortfall in RFC4007.)
Would it be possible to make zone IDs with upper case letters an error? That way if in the future it is possible to support them, it can be added backwards-compatibly.
Would it be possible to make zone IDs with upper case letters an error? That way if in the future it is possible to support them, it can be added backwards-compatibly.
Please don't. On linux it's just case sensitive, upper cases are not invalid per se.
It's perfectly possible (although I wouldn't recommend it) to create an interface wan0 and another interface Wan0 next to each other and you can ping fe80::1%wan0 and fe80::1%Wan0 to go through either interface appropriately. (You can also test with dummy interfaces and ip link add Test type dummy
for example:
# ip link add Test type dummy
# ip link add test type dummy
$ getent ahostsv6 fe80::1%test
fe80::1%11 STREAM fe80::1%test
fe80::1%11 DGRAM
fe80::1%11 RAW
$ getent ahostsv6 fe80::1%Test
fe80::1%10 STREAM fe80::1%Test
fe80::1%10 DGRAM
fe80::1%10 RAW
)
I don't think anyone is daring enough to do that in practice so I don't think having parsers assume one is equal to the other would be a problem, but case should be preserved for the actual host resolution/connect/sendto/whatever call
I don't think anyone is daring enough to do that in practice so I don't think having parsers assume one is equal to the other would be a problem, but case should be preserved for the actual host resolution/connect/sendto/whatever call
I still think parsers should be required to be case-sensitive here. Case-preserving is the bare minimum.
I'd be delighted if I thought that was reasonably possible, but having looked at some of the Firefox code, I really, really doubt it.
I'd be delighted if I thought that was reasonably possible, but having looked at some of the Firefox code, I really, really doubt it.
What would be required for it to work? Major refactoring?
You'd need to hop over to https://bugzilla.mozilla.org/show_bug.cgi?id=700999 and ask there.
Hi all,
In order to make some progress on this topic I would like to propose a compromise change to the URL standard that punts on all the hard questions about zone ID.
As indicated in Martin's feedback most browsers still have a problem with the zoneID and wouldn't implement it. However, URLs with a zoneID still exist, and the fact that URL parsers consider them invalid isn't great. The use case that I encountered was that my printer settings was pointing me towards a URL containing a zoneID - obviously that failed to parse, so I had to manually remove the zoneID from the URL to access it. The middle ground I'm thinking of is the parser would remove (and ignore) the zoneID while parsing the URL so at least it works on machines that have a default zone ID.
The changes to the URL parsing algorithm would be minimal:
In https://url.spec.whatwg.org/#concept-ipv6-parser Step 6 would become: While [c](https://url.spec.whatwg.org/#c) is not the [EOF code point](https://url.spec.whatwg.org/#eof-code-point) and [c](https://url.spec.whatwg.org/#c) is not U+0025 (%):
and 6.7 would become Otherwise, if [c](https://url.spec.whatwg.org/#c) is not the [EOF code point](https://url.spec.whatwg.org/#eof-code-point) and [c](https://url.spec.whatwg.org/#c) is not U+0025 (%), [validation error](https://url.spec.whatwg.org/#validation-error), return failure.
This would have the effect of the zoneID being ignored, so at least we are able to parse such URLs.
@annevk if this is acceptable I will send a PR. Hopefully this is non-controversial enough to be acceptable to Blink and WebKit too.
I think that warrants a new issue. It's not clear to me that is a good idea because the authority question remained unresolved. If it should impact authority and we end up treating multiple distinct authorities as one, that would not be good. And while there are plenty of ways to make a URL appear like another one, I'm not sure we want to add to that problem.
Also in other domains ignoring all input after a certain character has led to injection attacks. How would we avoid those here?
It's worth discussing, but I wouldn't classify it as non-controversial.
IMO, we should support Zone IDs.
Fundamentally, no host has a universally-guaranteed meaning. The URL standard does not define what hosts actually mean, and generally the assumption is that they will be passed to a system resolver.
How that resolver works is undefined, and in general, different systems will do different things, and allow for the user to customise different parts of the process. For instance, the hosts
file can be used to provide a custom mapping, and after that the system may search the local network or other sources before falling back to DNS (The Windows GetAddrInfoEx
function, for example, claims to support not only DNS, but also NetBIOS, WINS, Bluetooth, and various peer-to-peer protocols). But generally, after consulting local sources, the resolver will query DNS.
DNS itself can be heavily customised - both by the user, and by the backend. Users can provide custom DNS servers (e.g. Google public DNS), and ISPs can direct queries to particular servers using dedicated physical infrastructure, on-site caches, or to alternate websites (let's imagine the state has a problem with website X and wants to send users to a more ideologically-appropriate site). Ultimately, we have no way to detect any of that. We have no idea what the hostname example.com
actually means, and whether the result obtained by a specific client resolution process accurately reflects what the author of the URL intended. And in modern networks, where devices are mobile, generally suspend rather than shut down, and may be negotiating between various WiFi and cellular networks, network configurations can easily fluctuate within the lifetime of a single process, meaning the identity of a resolved name is constantly in flux.
IP addresses are similarly fuzzy. Two machines with different network configurations may have different understandings of what a given address should mean. We give an IP address to the system, and it connects to some machine, and that's about as much as we can say about it. It doesn't come with nearly as much ambiguity as domains have, but it's all still client-specific.
So when I see arguments such as:
Inclusion of purely local information in the universal identity of a resource runs directly counter to the point of having a URI.
And
the Web security model depends on having a clear definition for the origin of resources. The definition of Origin depends on the representation of the hostname and it relies heavily both on uniqueness (something a zone ID potentially contributes toward) and consistency across contexts (which a zone ID works directly against)
I think it overstates how much we can actually rely on existing hostnames to be unique, and it fails to explain how 10.0.0.1
and [::abcd]
constitute a "universal identity" which is "consistent across contexts" but [::abcd%eth0]
somehow is neither.
But more to the point, I think it misrepresents what URLs are. URLs are universal identifiers, but that does NOT mean that they contain the universal identity of a resource. It just means that they subsume all other kinds of identifiers. It is perfectly fine to use URLs to identify data in a local application - e.g. something like my-recipe-app:/chicken-curry/ingredients#4
is not a misuse of URLs, even if it fails to resolve, or resolves to something else, on another machine.
URLs are, IMO, simply a flexible syntax for expressing the different kinds of identifiers that exist, so that any application can see the URL http://[::abcd%eth0]/config/foo
, understand what the different parts are, and infer how to connect to that resource, using the system interfaces available to do so (accepting that they may be configurable).
And I think it should be possible to express these kinds of locations under the http
scheme. They are popular enough that many shipped products use them, and operating systems have included the required interfaces to resolve these names for over a decade. They seem to be an intrinsic part of IPv6 addresses, so IMO the only reasonable course is to accept them as part of our support for IPv6 addresses.
Of course, no client is obligated to support a particular kind of host. I don't see any technical for doing so, but browsers should be allowed to decline requests to such URLs if they wish. I hope they would at least make it a configurable option rather than an outright ban.
The URL Standard and standards that build on it do end up using and exposing the host in quite a few ways. So perhaps meaning is not strictly-speaking defined, but there is a lot of behavior build on top that is outside the realm of DNS.
We cannot just change the syntax without addressing that. I told the RFC authors repeatedly that syntax isn't really the problem here. It's the end-to-end integration.
And even if someone solved that, there's also the problem of getting implementer interest, which is a requirement per our Working Mode. And thus far I've largely seen opposition on that front.
@valenting : "The use case that I encountered was that my printer settings was pointing me towards a URL containing a zoneID - obviously that failed to parse, so I had to manually remove the zoneID from the URL to access it." That only works if you're lucky enough to have your printer on the default link (aka zone). As home networks get more complex that isn't guaranteed, although I agree that it's a useful fix for the common case. @annevk: "It's the end-to-end integration." We (authors) understood that point and would like to know in which way the latest draft doesn't answer the concern. Of course we are not going to specify the algorithms, but we can of course add more about the expected behaviour. In IETF terminology: send text.
Currently, there is no way to point a browser at
fe80::1%lo
.Proposed syntax:
https://[fe80::1%25lo]:80