w3c / reffy

Reffy is a Web spec crawler and analyzer tool. It is notably used to update Webref
MIT License
71 stars 23 forks source link

Links defined in `anchors` section in bikeshed classified as "autolinks" #1584

Open dontcallmedom opened 4 months ago

dontcallmedom commented 4 months ago

in #1518 we started distinguishing between rawlinks and autolinks to simplify and reduce false positive in broken link detection in strudy.

I realized today that the heuristic to distinguish between the two is not working so well for bikeshed: links that are defined in <pre class='anchors'> sections get classified as autolinks whereas in fact they're manually maintained URLs.

I don't think there is an easy way to fix this short of changing bikeshed to markup these links differently.

(conversely, if/when we fix it, we would be able to detect systematically links that are defined in anchors section but don't need to be)

dontcallmedom commented 4 months ago

conversely, bikeshed generates links for the automatically generated index sections without data-link-type; this could either be fixed upstream (probably preferable), or reffy could be automatically classify as autolinks links it finds in ul.index elements.

tidoust commented 2 months ago

conversely, bikeshed generates links for the automatically generated index sections without data-link-type; this could either be fixed upstream (probably preferable), or reffy could be automatically classify as autolinks links it finds in ul.index elements.

I note that, in some specs, the asides appear after the ul.index elements, probably because the build was made with an older version of Bikeshed. For example, see index in WebXR Lighting Estimation

We could probably exclude aside elements that have a dfn-panel class from links extraction altogether: they do not contain any link that does not already appear somewhere else in the spec.