urlstechie / urlchecker-python

:snake: :link: Python module and client for checking URLs
https://urlchecker-python.readthedocs.io
MIT License
20 stars 13 forks source link

add option to --no-check-certs use at own risk #89

Closed vsoch closed 9 months ago

vsoch commented 9 months ago

This will address https://github.com/urlstechie/urlchecker-action/issues/105. After it is tested by the person that opened the issue we will merge, release and update the action.

vsoch commented 9 months ago

@SuperKogito it looks like one of your previously working escape sequences is now considered invalid syntax:

urlchecker/core/urlmarker.py:53
  /home/runner/work/urlchecker-python/urlchecker-python/urlchecker/core/urlmarker.py:53: SyntaxWarning: invalid escape sequence '\['
    "[^\\s()<>\[\\]]+|\\([^\\s()]*?\\([^\\s()]+\\)[^\\s()]*?\\)",

And then it's not detecting the urls and some tests fail I think?

vsoch commented 9 months ago

@SuperKogito could you please do a PR to the PR branch here (and then it can be also tested)?

SuperKogito commented 9 months ago

https://github.com/urlstechie/urlchecker-python/pull/90 but I am having the same fail error :/

vsoch commented 9 months ago

okay doing a bit of work:

image

Going to look at the checking issues next.

vsoch commented 9 months ago

okay I found the issues - nothing to do with our regex or the requests, it was an update to selenium webdriver that deprecated some of the logic we were using. As a result, the driver was failing, returning to be None, and since that is the primary means to get a lot of these URLs (e.g., the initial requests response is not allowed), a lot (actually many) were failing. This is becoming more common with websites, as is logical, they don't want people scraping. But they can't prevent a selenium webdriver from doing so.

image

I'm finishing up local tests now and will push the fixes shortly.

vsoch commented 9 months ago

Note for myself: we will need to update the driver in the Dockerfile as well, once we find the one that matches GH actions.

vsoch commented 9 months ago

That green is sure beautiful :) :green_apple: https://github.com/urlstechie/urlchecker-python/actions/runs/7769722953/job/21189125531?pr=89

Just pushed the update for the container, and we should be able to merge and release soon and test with the action.