Open bensternthal opened 8 years ago
I was able to get this to run by commenting out the following line:
link.html.location = node.__location.attrs[attrName];
https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/scrapeHtml.js#L34
I did not find references to this attribute in the code, so I am not sure what it is used for. I am also unsure why this would cause an issue. If i leave this in... return links;
is never reached.
Still diagnosing what could be causing this.
Sooo, the above led me to find a mismatched a href in my code. Fixing that fixed this.
This might be a scenario where node.__location.attrs returning undefined throws an error.
Let me know what you think.
Can you provide the HTML that caused the issue?
Here is an example snippet that will reproduce the error:
https://gist.github.com/bensternthal/e186520f239909b0ba52e861d01bfaca
The</a>
on line 11 is causing the issue.
var result = require("parse5").parse("<a href=test><div>text</a></div>", {locationInfo:true});
console.log(result.childNodes[0].childNodes[1].childNodes[0])
produces:
{ nodeName: 'a',
tagName: 'a',
attrs: [ { name: 'href', value: 'test' } ],
namespaceURI: 'http://www.w3.org/1999/xhtml',
childNodes: [],
parentNode: {…},
__location:
{ line: 1,
col: 1,
startOffset: 0,
endOffset: 26,
attrs: { href: [Object] },
startTag: { line: 1, col: 1, startOffset: 0, endOffset: 13, attrs: [Object] },
endTag: { line: 1, col: 23, startOffset: 22, endOffset: 26 } } }
So the problem must not be the html parser. I'll look into this deeper when I find some time. Thank you for the snippet.
No prob, glad I can help. The module is very handy, many thanks for creating & maintaining it.
I think this is an edge case but since it happened to me... I would like to note this here. I'll try to dive in and who knows..maybe submit a pr.
Steps To Reproduce Run
blc http://devpatch.com:3000 --filter-level 3 -ro
More Info I am running a dockerized version of a wordpress site. Testing both locally and the dev instance hosted on devpatch, the broken link checker never fetches or checks a page. Looking at the logs I see the request from BLC but that is it. Below is a screenshot. Left is log, Right is Console output.
I verified I could run BLC on a static non-docker hosted locally at the same port without issue.