stevenvachon / broken-link-checker

Find broken links, missing images, etc within your HTML.
MIT License
1.95k stars 302 forks source link

Question: CLI Output broken links only? #200

Closed JayHoltslander closed 4 years ago

JayHoltslander commented 4 years ago

Currently running on my local machine with:

blc http://mymac.local:5757/blog -ro --exclude-external > output.txt

Is there a way I can modify the above command to only output broken links and ignore all "0 broken" links? I checked --help to look for an option but didn't see a way to do this.

xyzesteban commented 4 years ago

One way to do it is install the Programmatic API and write a script with your own instance of broken-link-checker; the blc command by itself might not be enough to do that since it shows them all by default.

In my script I used the HtmlUrlChecker class, and I modified the 'link' function. You could do something like this and you can further tweak it to check or display how you want it:

const htmlUrlChecker = new HtmlUrlChecker(options)
  .on('error', (error) => {})
  .on('html', (tree, robots, response, pageURL, customData) => {})
  .on('queue', () => {})
  .on('junk', (result, customData) => {})
  .on('link', (result, customData) => {
    if (result.broken) {
      console.log(result.brokenReason, result.url.resolved)
    }
  })
  .on('page', (error, pageURL, customData) => {})
  .on('end', () => {});

htmlUrlChecker.enqueue(pageURL, customData);
xyzesteban commented 4 years ago

There is an issue already opened for this and it includes a few more solutions & workarounds: #133

frederickjh commented 3 years ago

And there is a push request #137 that implements this that came out of issue #133, that seems to have been ignored for over two years, as it does not have one comment.