stevenvachon / broken-link-checker

Find broken links, missing images, etc within your HTML.
MIT License
1.95k stars 302 forks source link

Feature Request: Less Verbose Options (Broken-Only, 404-Only, etc) #133

Open Ravlen opened 5 years ago

Ravlen commented 5 years ago

I find BLC to be extremely useful, but the output has too much information (I'm betting the majority of users are looking for the BROKEN links, not the OK links). It would be great to have a CLI option for outputting only broken links, or only certain types of errors (404, 403, etc), and any page with 0 broken links would have nothing output at all.

For example I get output like this now:

Getting links from: https://www.example.com/archives/
├───OK─── https://docs.example.com/install/
├─BROKEN─ https://docs.example.com/archives.html (HTTP_404)
├───OK─── https://example.com/doc
├───OK─── https://example.com/docs
├───OK─── https://example.com/docs/archives
├───OK─── https://example.com/content/archives.html
├───OK─── https://example.com/example-docs/
└───OK─── https://example.com/master/doc
Finished! 88 links found. 80 excluded. 1 broken.

Getting links from: https://docs.example.com/ssh/
├───OK─── https://the.earth.li/%7Esgtatham/putty/0.67/htmldoc/Chapter8.html#pubkey-puttygen
├───OK─── https://wiki.eclipse.org/EGit/User_Guide#Eclipse_SSH_Configuration
├───OK─── https://www.digitalocean.com/community/tutorials/understanding-the-ssh-encryption-and-connection-process
└───OK─── http://www.chiark.greenend.org.uk/%7Esgtatham/putty/download.html
Finished! 120 links found. 115 excluded. 0 broken.

A --less-verbose flag would output only this (the second link scanned would output nothing since there were no broken links):

Broken link(s) from: https://www.example.com/archives/
└─BROKEN─ https://docs.example.com/archives.html (HTTP_404)
tasmo commented 5 years ago

An interim solution is to use a pipe to grep.

blc -r https://www.example.com/archives/ |  grep --color=never -e 'Getting links' -e '404' -e 'Finished!'
alexlouden commented 5 years ago

I threw this together - it adds a -q/--quiet flag to only show broken pages & links: https://github.com/alexlouden/broken-link-checker

greggman commented 5 years ago

An interim solution is to use a pipe to grep.

blc -r https://www.example.com/archives/ |  grep --color=never -e 'Getting links' -e '404' -e 'Finished!'

Thanks for this but it's not really that helpful as it shows every page, even if that page has nothing broken so if you've got a broken page in a 1000s pages you have go through a 1000s lines trying to find the one that has the broken link.

Would you except a quiet option patch that only output names if something is broken?

alexlouden commented 5 years ago

Would you except a quiet option patch that only output names if something is broken?

Hey @greggman - I've implemented this in my fork, if you'd like to have a look? https://github.com/alexlouden/broken-link-checker

We're using my version at work in our CI and it makes it a lot clearer to see what's broken

greggman commented 5 years ago

@alexlouden that's great. Have you submitted a PR?

alexlouden commented 5 years ago

Just submitted one @greggman - thanks for the push 😃

jackfoust commented 5 years ago

@greggman

If you dump tasmo's suggestion above into a text file you can run the following against it to remove the redundant "Getting links from" noise.

sed '/Getting links from/{$!N;/\n.*Getting links from/!P;D}' file

This command will remove a line containing "Getting links from" if it is immediately followed by a line "Getting links from".

alexfornuto commented 5 years ago

@greggman

If you dump tasmo's suggestion above into a text file you can run the following against it to remove the redundant "Getting links from" noise.

sed '/Getting links from/{$!N;/\n.*Getting links from/!P;D}' file

This command will remove a line containing "Getting links from" if it is immediately followed by a line "Getting links from".

Can you update this command to match the new syntax, which includes:

Finished! # links found. # excluded. # broken.
frederickjh commented 3 years ago

Hey @greggman - I've implemented this in my fork, if you'd like to have a look? https://github.com/alexlouden/broken-link-checker

@alexlouden Thanks for your fork! I works well to lower the noise level. I installed it globally with:

npm install git+https://github.com/alexlouden/broken-link-checker -g