whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
526 stars 137 forks source link

Rephrase the stated goal of obsoleting RFC 3986 and RFC 3987 #703

Closed alwinb closed 1 year ago

alwinb commented 2 years ago

The topmost goal of the WHATWG standard states:

  • Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete them in the process.

I believe that it will be good for this standard to first discuss, and then rephrase this goal.

The goal as stated, is confusing; It is easy to read this to mean that the RFCs are no longer relevant and that there is a consensus across committees that the WHATWG standard is the one common URL standard. The WHATWG standard does not cover the same things as the RFCs do and vice versa, and the IETF has not endorsed the WHATWG standard. As a consequence of the above, It is likely to confuse or upset people.

I would love to see this goal being rephrased in a way that is more honest, more accurate, and less likely to be experienced as offensive. Adding a bit of discussion around this can help a lot to dissolve misunderstandings, annoyances and even hostile responses.

State the facts and inform the readers about the situation so that they can make their own decisions. Be honest about what the WHATWG Standard does and does not provide. Be clear that the goal has not been met yet and work towards it, or amend it and refer people to the RFCs for those things that the WHATWG cannot provide.

alwinb commented 2 years ago

Motivated in part by the discussion in #479

annevk commented 2 years ago

You'll have to elaborate on what is confusing here. The idea is indeed that those particular RFCs are no longer relevant.

alwinb commented 2 years ago

You'll have to elaborate on what is confusing here.

It is confusing and likely to upset people.

From what I have read, the WHATWG standard was created because the IETF RFCs did not specify error recovery and did not match browser behaviour. It was created in response to the fact that changing browsers to match the RFCs wasn't feasible, and in addition to that, a lack of progress and the fact that nobody else with the IETF took on this task.

Anne, since you wrote the standard, you can amend, clarify or confirm.

It is clear that this split didn't leave the WHATWG on good terms with the IETF community. Given that situation, it is extra important to be careful about the way in which you state your goals.

To be clear, the WHATWG Standard does provide many things that the RFCs did not. It does provide an algorithm. It provides an API specification, there is a reference implementation and a test suite. The WHATWG does make progress on describing the 'real world' situation and on getting web browsers aligned. Why don't you state that?

There are also a lot of things that the RFCs provide that the WHATWG standard does not provide:

That's elaborate enough, I hope!

The pont is, you don't obsolete the RFCs, you just don't and you don't seem to understand that. People depend on the RFCs and you claim you're obsoleting them, but you don't provide in their needs. That is what is causing upset and hostility. This is no different from the situation a decade ago, where the IETF standards did not provide in the needs of browsers, causing upset an hostility towards them.

To make it worse, applications that depend on those specific RFC features, are now stuck in a situation where they cannot be web-compatible. This hurts the adoption rate of this standard, and causes more frustrations. It's just not a healthy situation.

This is why I opened this issue. As long as these issues are not solved, at least be very open about them and discuss them, I believe this would be good for the larger community.

cc @masinter, @mnot

Note that I am not just complaining. I have taken action when you did not respond to these concerns. I wrote this URL Specification and a reference implementation that passes the tests and adds some of the missing features. It matches the behaviour specified in the WHATWG standard. I'm showing you a path forward.

masinter commented 2 years ago

I think the characterization of the root problem as one of "respect" and "giving credit" is dead wrong. It's a disagreement about an engineering question about the applicability of the "Postel Principle" to one particular protocol element which has many roles outside of browsers and HTML.

URLs may seem to be just another part of the browser experience and for that application, perhaps it is useful to specify additional rules and behavior. But being generous to writers who type in URLs by hand into HTML and the Address bar isn't worth the pain of making every other non-browser application suffer.

People are used to URLs breaking, not just new 404 not founds (and the occasional 418) but we've managed to move large swaths of the net from http: to https:.

IETF is not a membership organization -- "the only ones there are the people who come". Why isn't there a WHATWG HTTP Living Standard? Perhaps that might give a clue.

karwa commented 2 years ago

But being generous to writers who type in URLs by hand into HTML and the Address bar isn't worth the pain of making every other non-browser application suffer.

Then again, it appears that many non-browser applications deviate from the RFCs specifically to adopt lenient parsing behaviour from the web, as it can emerge in unexpected places. HTTP redirects, for instance, seem to come up often as a place where people see dodgy URLs/relative references.

So I'm not really sure that it is fair to characterise web-compatibility as "pain" and "suffering"; non-browser applications seem to be willingly adopting these behaviours, at the request of their users, even before there was a formal standard telling them what web-compatibility even meant.

For most modern browsers, input in to the address bar goes through a totally different "fixup" process which considers things like the user's history and bookmarks, automatically adds schemes, adds TLDs to domains (e.g. .com), and lots of other stuff. It has nothing to do with anything in this standard -- and the standard in fact calls it out as being out-of-scope:

How user input in the web browser’s address bar is converted to a URL record is out-of-scope of this standard.

alwinb commented 2 years ago

I think the characterization of the root problem as one of "respect" and "giving credit" is dead wrong.

This is not at al what I tried to communicate.

I know you disagree about more tolerant parsing. But the way you've responded to that challenge is not functional. The current situation with two distinct standards that don't match each other, and don't specify how why and where they are different, is far worse than the thing you've been trying to avoid.

If you can't solve that disagreement, then at least describe the situation to you readers, describe the conflict, describe the differences, identify the overlap, learn from each other, and do align with each other on the issues where you don't disagree.

annevk commented 2 years ago

I think those who contributed to the RFCs are mentioned in https://url.spec.whatwg.org/#acknowledgments. Not sure what makes you say they are not.

As for things the RFCs provide that this document does not, most of those are intentional design decisions. And we typically don't document those in the standards. If some concrete need for any of those can be shown that's open for reconsideration of course, but thus far that bar hasn't quite been met.

And to be clear, I somewhat regularly interact with the IETF community and many there can appreciate the situation for what it is. I would not say the WHATWG is on bad terms.

alwinb commented 2 years ago

Not sure what makes you say they are not.

The standard text does not communicate that you acknowledge, as in, are aware of, the structural design of the RFCs and the motivations behind that design.

As for things the RFCs provide that this document does not, most of those are intentional design decisions.

It is strange to claim that you want to obsolete the RFCs and to then make a deliberate decision to not include important parts of what they cover.

The standard text does not mention this and does not provide a justification for that decision.

I don’t have anything else to say about this that I’ve not already said.

masinter commented 2 years ago

This seemed too coincidental, arriving in my mailbox:

https://www.washingtonpost.com/politics/2022/09/22/mozilla-report-takes-aim-tech-giants-grip-web-browsers/

but without malice or bias:

https://masinter.blogspot.com/2009/05/structural-bias-standards-and-elsewhere.html

annevk commented 1 year ago

The standard text does not communicate that you acknowledge, as in, are aware of, the structural design of the RFCs and the motivations behind that design.

I don't think that belongs in a standard. I haven't seen this done in other standards efforts either.

It is strange to claim that you want to obsolete the RFCs and to then make a deliberate decision to not include important parts of what they cover.

I suspect we disagree on "important parts".


Perhaps there is another way to solve this. Do you have a suggested rephrasing of the current goal that would make this a bit more clear in your view?

tmccombs commented 1 year ago

It seems like this specification is primarily concerned with how urls are used by browsers. Which makes sense for a specification from WHATWG. As a specific example, one reason stated for not supporting relative urls in #421 and #531 is that they aren't needed in the browser. However, browsers are not the only places that urls are used. Perhaps that goals should be scoped to "in the context of a web browser"? Although, that brings up the concern of divergence between how urls are handled in the browser and on servers.

bagder commented 1 year ago

IMHO, the API should be dealt with in a separate repo as it is unrelated to the URL spec.

annevk commented 1 year ago

I was about to close this issue due to the lack of specific suggestions, but noticed that @xfq added the i18n-tracker label. Is this something the i18n WG wants to weigh in on @xfq?

aphillips commented 1 year ago

@annevk I think we are tracking it because it's I18N related and so that we'd see any activity pop up in our digest (which is how I noticed the comment). We'll ping you back today, since we're all here at TPAC 😉

aphillips commented 1 year ago

@annevk I18N is okay with you closing this: we have no specific issue here.

annevk commented 1 year ago

Thanks @aphillips!

tmccombs commented 1 year ago

Here's a specific recommendation for the stated goals. Change the first bullet point to:

Align RFC 3986 and RFC 3987 with contemporary browser implementations and replace the RFC as the standard for URLS in the browser ~obsolete the RFCs in the process~. (E.g., spaces, other "illegal" code points, query encoding, equality, canonicalization, are all concepts not entirely shared, or defined.) URL parsing needs to become as solid as HTML parsing. [RFC3986] [RFC3987]

(Additions in italics, removals with strikethrough).

and maybe add a section that compares the differences with the RFCs and discusses when it is appropriate to use the WHATWG standard vs the RFCs.

Or alternatively, increase the scope of the project to include all non-browser use-cases, and addresses some of the deficiencies that @alwinb mentioned above.

alwinb commented 1 year ago

I agree with the comment of @tmccombs above.

And definitely add a section describing the differences between the RFCs and both valid and parsable WHATWG URLs. This is a basic requirement for any good standards document.

I was about to close this issue due to the lack of specific suggestions

Closing this issue without acting on it, I consider proof of the fact that the WHATWG is more interested in advancing its own political position than in creating clear, high quality open standards that benefit the internet community at large.

It makes no sense for me to put more effort into this if there is a political motivation to frustrate my efforts.

My father used to say that politics was a necessary evil. But I believe that politics is the consequence of a lack of personal strength :)

I wish you good luck going forward.