mdn / browser-compat-data

This repository contains compatibility data for Web technologies as displayed on MDN
https://developer.mozilla.org
Creative Commons Zero v1.0 Universal
4.95k stars 1.98k forks source link

Normalize spec URLs from HTML spec #10090

Closed sideshowbarker closed 3 years ago

sideshowbarker commented 3 years ago

While putting together patches for spec URLs, it became clear that the URLs scraped from MDN which cited the HTML spec had inconsistencies. So I spent some time “normalizing” the HTML spec URLs, and will submit a series of patches for them.

I’m opening this issue to give a summary of the details, to avoid repeating those in the commit messages for each patch.

  1. Some URLs lacked a filename part; e.g., https://html.spec.whatwg.org/multipage/#2dcontext. So in all cases, those URLs now have a filename part; e.g., https://html.spec.whatwg.org/multipage/canvas.html#2dcontext. The eliminates a wasteful redirect. Also because the redirect from the filename-less URLs to the ones with filenames is done on the client side, in the browser, including the the filenames in the BCD spec URLs ensures the URLs work as expected with processing tools that don’t load the document and execute the JavaScript required to otherwise do the redirect.

  2. Some URLs had an incorrect filename; e.g., https://html.spec.whatwg.org/multipage/interaction.html#the-dragevent-interface. So in all cases, those now have the right filenames; e.g., https://html.spec.whatwg.org/multipage/dnd.html#the-dragevent-interface.

    As with filename-less case, a client-side redirect executed in JavaScript in the browser redirects the user to the correct URL. But that won’t work for processing tools that don’t load the document and execute the JavaScript — so to cover all use cases, we need to have the correct filenames in BCD.

  3. Some URLs cited a spec section not written for developers but instead written for browser implementors; for example, https://html.spec.whatwg.org/multipage/media.html#dom-audiotrack-enabled — which has a corresponding section at https://html.spec.whatwg.org/multipage/media.html#dom-audiotrack-enabled-dev which is written specifically for developers. So the pattern for those changes is that -dev got appended to the URL.

    The rationale for referencing those -dev sections rather than the non-dev sections is that -dev sections described the observable behavior that developers see from their JavaScript code. In contrast, the non-dev section often describe internal behavior that browser implementors must implement behind the scenes — in other words, implementation details that only browser-engine developers need to care about but that developers don’t need. So the developers should be directed to the details written specifically for developers in the -dev sections.

  4. Some URLs cited a target within a WebIDL block in cases where citing a section heading would provide a better reader/user experience. For example, https://html.spec.whatwg.org/multipage/#abstractworker was replaced with https://html.spec.whatwg.org/multipage/workers.html#the-abstractworker-mixin.

  5. Some URLs were changed in ways not falling into any of the above categories but instead are just “miscellaneous cleanup”.

Elchi3 commented 3 years ago

This all sounds very reasonable to me, @sideshowbarker! Thanks for putting so much effort into this!

It would be good to start documenting this and other guidelines for spec_url that we've come up with in the course of adding them to BCD.

sideshowbarker commented 3 years ago

It would be good to start documenting this and other guidelines for spec_url that we've come up with in the course of adding them to BCD.

Good point — I’ve opened https://github.com/mdn/browser-compat-data/issues/10106 for that.

sideshowbarker commented 3 years ago

All the related PRs for this have been merged.