urlstechie / urlchecker-action

:octocat: :link: GitHub action to extract and check urls in code and documentations.
https://urlchecker-python.readthedocs.io
MIT License
34 stars 12 forks source link

fake_useragent error #99

Closed ricardozanini closed 1 year ago

ricardozanini commented 2 years ago

Hey guys!

Not sure if I'm doing something wrong, but on my end a simple check on a Jekyll website I got:

WARNING:fake_useragent:Error occurred during loading data. Trying to use cache server https://fake-useragent.herokuapp.com/browsers/0.1.12

In the end, the checker won't find any URL to check, ends in an error and the CI passes. Here's the full log: https://github.com/kiegroup/kogito-website/runs/7903901907?check_suite_focus=true

My yaml:

name: Check URLs

on: [pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - uses: ruby/setup-ruby@v1
      with:
        ruby-version: '3.1'
        bundler-cache: true
    - uses: actions/cache@v3
      with:
        path: vendor/bundle
        key: ${{ runner.os }}-gems-${{ hashFiles('**/Gemfile') }}
        restore-keys: |
          ${{ runner.os }}-gems-
    - run: |
          gem install bundler jekyll
          bundle check || bundle install
          bundle exec jekyll build
    - name: urls-checker
      uses: urlstechie/urlchecker-action@0.0.32
      with:
        # subfolder with files to test
        subfolder: _site
        # A comma-separated list of file types to cover in the URL checks
        file_types: .html,.js,.css,.xml
        # Choose whether to include file with no URLs in the prints.
        print_all: false
        # The timeout seconds to provide to requests, defaults to 5 seconds
        timeout: 5
        # How many times to retry a failed request (each is logged, defaults to 1)
        retry_count: 3
        # choose if the force pass or not
        force_pass : false

The cache action can be ignored, I guess the ruby action is already doing this work, I'll review later.

vsoch commented 2 years ago

@ricardozanini that error is more a warning that the library is falling back to a cache - it usually still runs after that (did you try a re-run)?) That said, the developer of that library has been 404, and likely the first step here is to switch to installing from:

pip install git+https://github.com/danger89/fake-useragent.git

I'll put in a PR tonight to do this, and then we can go from there.

ricardozanini commented 2 years ago

@vsoch thanks for the prompt reply! I wonder why I see There were no URLs to check. message. :(

Since the _sites folder has a few HTML files. Maybe a configuration error from my side? I thought that WARNING had some kind of relation to this problem.

vsoch commented 2 years ago

Let me get the fix in this evening and I'll ping you with a branch to test - I think your options look good because I cloned, built, and ran the same command locally (and I saw checks). I'm not allowed to work on personal projects during the work day, but promise I'll get this in for you tonight after the workday finishes!

ricardozanini commented 2 years ago

Thanks @vsoch! Let me know if I can help somehow ;)

vsoch commented 2 years ago

hey @ricardozanini ! Ready for your help! I have a PR branch here https://github.com/urlstechie/urlchecker-action/pull/100 that would be super helpful if you can test - instead of urlchecker-action@main do urlchecker-action@update/fakeuser-agent. Let me know if that resolves the issue! If so, I have a PR in to urlchecker-python to fix the underlying issue, and that will propogate to a new container to release here. Thank you!

ricardozanini commented 2 years ago

@vsoch many thanks! Can you please change the branch name to something else without the bars? /. Otherwise, I can't test it:

the `uses' attribute must be a path, a Docker image, or owner/repo@ref
vsoch commented 2 years ago

You actually can - I’ve done it many times! - Did you remember the org name urlstechie first?

ricardozanini commented 2 years ago

Did you remember the org name urlstechie first?

Sure.

I pushed the renamed branch to my fork, but it won't work since my fork is not a GH registered action.

vsoch commented 2 years ago

I don't understand the issue - your fork doesn't need to be hosting anything. You should just be able to do:

    - name: urls-checker
      uses: urlstechie/urlchecker-action@update/fakeuser-agent

I've done this many times - it works because you can use an action from any branch or general ref. What error are you seeing exactly?

carceneaux commented 2 years ago

@vsoch I was experiencing the same issue and tested your code as described. I can confirm that it works great and resolved the issue.

ricardozanini commented 2 years ago

I don't understand the issue - your fork doesn't need to be hosting anything. You should just be able to do:

SORRY.

I didn't understand your first message, it seems. I'll do it and report soon.

vsoch commented 2 years ago

Thanks @carceneaux !

ricardozanini commented 2 years ago

@vsoch worked! https://github.com/kiegroup/kogito-website/runs/7954015645?check_suite_focus=true

Although I'm still seeing a few stack traces, I assume this is normal since it's a fallback to the cache server, right? I think you can release a new patch :)

Many thanks!

vsoch commented 2 years ago

@ricardozanini I checked the (newer) fake-user agent code, and it at least is hitting the right repository! That looks like a one-off failure to hit the main server, as you noted. I'm not loving that error - but it does seem that the rest looks okay. https://github.com/danger89/fake-useragent I'm going to merge here, and let's watch this issue! If it's something regular we can open an issue at https://github.com/danger89/fake-useragent.

vsoch commented 2 years ago

Should be released in a few minutes - if y'all see other libraries to get good user agent strings (or even more active forks to the one I've already picked) please share! It's an important piece of the library and my biggest concern to have working well.

vsoch commented 2 years ago

okay, new action is released - master branch and version 0.0.33.

vsoch commented 2 years ago

This is now doubly fixed to restore the original fake-useragent, which since has two releases on pypi.Thanks!

NeilHanlon commented 1 year ago

Hi folks,

I am encountering this issue again in December 2022. e.g. https://github.com/rocky-linux/documentation/actions/runs/3713744520/jobs/6296764864

cc @sspencerwire

vsoch commented 1 year ago

Thanks @NeilHanlon ! I'll open up an issue with the developer. Maybe I can chat with him about other ways to host the data because the server seems unreliable.

NeilHanlon commented 1 year ago

thank you! Please extend my support as well as the Rocky Enterprise Software Foundation's, if anything is needed in terms of infrastructure, etc. We are quite happy with the urlchecker tool and aim to support all open source wherever we can.

vsoch commented 1 year ago

Oh that's incredibly kind, thank you! I'll post that in the issue.

vsoch commented 1 year ago

An update! https://github.com/fake-useragent/fake-useragent/issues/163#issuecomment-1355517090 I don't think we have the fake-useragent pinned, so if you try installing again it might work. I'm also asking the developer about why we need a server, period - if we could not require that it might be more reliable.

melroy89 commented 1 year ago

We don't use heroku caching server anymore.

NeilHanlon commented 1 year ago

ah great, thank you! @vsoch, could you release a new version of the action? I believe the bundled container in the latest version has the older package from pypi so we need a new version to work around it.

vsoch commented 1 year ago

Ah good point, let me look into that.

vsoch commented 1 year ago

ok - the action uses the urlchecker-python 0.0.34 release, so I just rebuilt that and checked the version of fake-useragent is the latest on pypi 1.1.1 (at least I hope this is the right one!)

/code# pip freeze | grep fake
fake-useragent==1.1.1
melroy89 commented 1 year ago

Yes that is the latest version. That is the right one. Let me know if experience issues with that release.

vsoch commented 1 year ago

Closing issue - @NeilHanlon feel free to ping me if something else comes up.