Closed marcoscaceres closed 7 years ago
Ok, it's sorta working... the image failure is expected... but it's a bit of a catch-22: the image will need to point to some absolute URL, but the problem is that the URL will either need to be on github pages or on the w3c server.
I slapped this together quite quickly, so I'm sure I overlooked a bunch of things... at least it shows that it could work.
This is definitely an improvement but I don't think it solves our main problem.. which is verifying anchor references on the target page -- that's where we keep getting bit.
Wondering if we can get Chrome headless to help out here.. Grab the DOM, check anchors, etc.
Actually, I think it does solve it. When you use "data-cite=" or use [[!spec]] in the document, those references are to /TR/
(or the WHATWG equivalents). So there should be no problem.
I don't think we need a headless browser to solve the anchor ref problem. Seems like we can: 1) Land current PR as a good initial step 2) Improve the solution to also curl all linked refs, JSDOM them and find (or fail to find) the anchor refs
WDYT?
- Improve the solution to also curl all linked refs, JSDOM them and find (or fail to find) the anchor refs
Note that the linkchecker already does 2 - but not of ReSpec document. However, as I already pointed out, there should be no reason to ever encounter a ReSpec document through following references in a specification. At least, for this spec, we should be good.
@igrigorik, what do you say? Should we just start with this and work towards improving things over time?
However, as I already pointed out, there should be no reason to ever encounter a ReSpec document through following references in a specification. At least, for this spec, we should be good.
Ah, interesting. Yeah, I agree this is a good step in the right direction.
@plehegar any thoughts or objections?
Ok, so, this is now a npm package.
It's still super rushed (anyone want to help me maintain it?), but could be something we could use for lots of specs.
Actually, this isn't catching all the broken links yet.
We are kinda screwed with the HTML spec, because the WHATWG changed to using JS for routing to the multipage specs (and blocks robots for things - which they might have overlooked)😡.
Also, @dontcallmedom, is it possible to get the link checker to generate the report as JSON? The HTML report is really difficult to scrape, because it doesn't provide any good anchor points that work with CSS selectors. It's only by accident that one can match on 404's, because an id gets added with "_404" at the end of the elements id.... but the rest of the errors don't have anything 😩
Closing, due to limitations... will try to find time later in the year to make a new link checker.
As a community, we absolutely need this replacement to the current link-checker. Given things like Electron, it should now be fairly trivial to create this... it's just finding time that's hard.
@marcoscaceres why Electron? I think we should be able to leverage new headless chrome and scrip that pretty easily?
I haven't looked at it to see what the API is. I like electron because it has a simple IPC API and you get Nodejs out of the box. If it does the same, then I'm all for it.
@igrigorik, ok - There is a NodeJS API built on the chrome debugging protocol. So definitely doable. Now just need to find time to work on it.
Anyone else interested in helping build this?
Prevents deployment unless link checker passes.
Preview | Diff