manuzhang / mkdocs-htmlproofer-plugin

A MkDocs plugin that validates URL in rendered html files
MIT License
40 stars 16 forks source link

Question: how to handle/ignore `-1` errors #63

Open emmercm opened 1 year ago

emmercm commented 1 year ago

Describe the bug

I'm getting a -1 error that I'm assuming means a timeout:

 WARNING  -  htmlproofer: invalid url - http://www.logiqx.com/DatFAQs/CMPro.php [-1] [dats.md]

I can't seem to ignore this in my mkdocs.yml:

plugins:
  - htmlproofer:
      raise_error_excludes:
        '-1': [ 'http://www.logiqx.com' ]

To Reproduce

Add a link to http://www.logiqx.com/DatFAQs/CMPro.php or some other website that's timing out and then run mkdocs.

Environment (please complete the following information):

Contents of mkdocs.yml:

plugins:
  - htmlproofer:
      raise_error_excludes:
        '-1': [ 'http://www.logiqx.com' ]
manuzhang commented 1 year ago

do you have another example? This site doesn't timeout for me.

emmercm commented 1 year ago

They must have fixed whatever was wrong.

There are websites that can be used to simulate the timeout, e.g. https://httpstat.us/200?sleep=60000 with a 60sec response delay.

manuzhang commented 1 year ago

I still cannot reproduce the issue. Can you make a PR adding your test case to our integration tests?

emmercm commented 1 year ago

I spent some time looking and I couldn't find any website that could simulate TooManyRedirects or ConnectionError. The best I found was https://httpstat.us/ which is just status codes.

I think this will need to be simulated in unit tests, where those exceptions are raised by the mocks. I'm not super familiar with unit testing in Python, I would need help there.

manuzhang commented 1 year ago

Can you firstly add your case to https://github.com/manuzhang/mkdocs-htmlproofer-plugin/tree/main/tests/integration?

emmercm commented 1 year ago

Adding http://www.logiqx.com wouldn't be a good test, as it looks like the website has since recovered and is not producing the error anymore. I think the same would be true of most real websites, I would expect the -1 code to be transient.

I could not find a website used for HTTP testing (e.g. https://httpstat.us/ mentioned above) that could produce a -1 code reliably, and I went searching for a while.

fghaas commented 1 year ago

@emmercm Did you ever get to the bottom of this issue? I ask because we are running into the exact same thing, with the exact same circumstances:

  1. www.logiqx.com is a DNS CNAME record; it points to logiqx.com.
  2. logiqx.com. has A records, but no AAAA records.

I see the same intermittent failures that you saw, against a host for which both these criteria are also true.

Now this kind of scenario shouldn't be a problem. But I am beginning to surmise that what we're dealing with may be an issue with the requests module, or beneath it the DNS resolution stack, where some moving part is somehow getting confused when running on a dual-stack (IPv4/IPv6) client. Would you reckon that that might be a possibility?

SeanTAllen commented 4 months ago

We are getting these fairly regularly, the -1 but it isn't reproducible for any individual URL.

If anyone has figured out a way to ignore the -1 entries, I'd love to hear about it.

If there is anything I could provide to help with this, please let me know.