stevenvachon / broken-link-checker

Find broken links, missing images, etc within your HTML.
MIT License
1.97k stars 305 forks source link

http://www.diamondproducers.com multiple 200 pages marked as 404 #114

Closed mkantautas closed 5 years ago

mkantautas commented 6 years ago

http://www.diamondproducers.com multiple 200 pages marked as 404 .

Is it a bug or an unusual way of blocking crawlers ?

stevenvachon commented 6 years ago

Does the same occur with the v0.8.0 branch?

hakudev commented 6 years ago

This is not a bug.

This website doesn't seem to accept the request method 'HEAD', which is the default value as specified in: https://github.com/stevenvachon/broken-link-checker#optionsrequestmethod

Solution:

let options = {
    /* [...] */
    requestMethod: 'GET',
};
beaulac commented 6 years ago

Agreed, this is not a bug, but can we turn this into a feature request?

It would be nice to allow specifying which error codes should cause a HEAD to be retried with a GET.

The proposed workaround (requestMethod: 'GET') works, but is undesirable in the case where only a small portion of links handle HEAD incorrectly (w/ non-405 error codes). HEAD is much faster in many other cases -- GET should ideally only be used to retry when encountering an error, to keep things speedy.

By default, this list of error codes would just be 405, as it is now, and we could maintain backwards compatibility by making the retry405Head option modify this list of error codes.

mkantautas commented 6 years ago

Upon Forking this repo and mergin this PR can say that this does fix the issue. Great work! #tested

merav2110 commented 6 years ago

Can I control this from the command line ?

beaulac commented 6 years ago

@merav2110 Not yet, unfortunately. It shouldn't be too hard to implement yourself if you need it; fork my fix branch and see #115 for inspiration.