urlstechie / urlchecker-python

:snake: :link: Python module and client for checking URLs
https://urlchecker-python.readthedocs.io
MIT License
20 stars 13 forks source link

Got "🤔 There were no URLs to check" message but several URLS were checked #80

Closed sturivny closed 2 years ago

sturivny commented 2 years ago

Preconditions:

Steps to reproduce:

  1. Clone the repo: https://gitlab.com/cki-project/documentation
  2. Navigate to the repo: cd documentation
  3. Execute: urlchecker check . --timeout 60 --retry-count 5 --files "content/docs/hacking/contributing/documentation/index.md"

Actual result:

❯ urlchecker check .      --timeout 60     --retry-count 5  --files "content/docs/hacking/contributing/documentation/index.md"
           original path: .
              final path: /home/sturivny/git-cki/documentation
               subfolder: None
                  branch: main
                 cleanup: False
              file types: ['.md', '.py']
                   files: ['content/docs/hacking/contributing/documentation/index.md']
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 5
                    save: None
                 timeout: 60
https://gitlab.com/cki-project/documentation/
https://gohugo.io/
https://www.docsy.dev/
https://gohugo.io/content-management/page-bundles/
https://cki-project.org/
https://www.gnu.org/software/stow/
2022-08-04 12:42:32,421 - urlchecker - ERROR - Error running task
ERROR:urlchecker:Error running task
🤔 There were no URLs to check.
Exception ignored in: <function Pool.__del__ at 0x7fd954e7eee0>
Traceback (most recent call last):
  File "/usr/lib64/python3.9/multiprocessing/pool.py", line 268, in __del__
    self._change_notifier.put(None)
  File "/usr/lib64/python3.9/multiprocessing/queues.py", line 377, in put
    self._writer.send_bytes(obj)
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 205, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
    self._send(header + buf)
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
OSError: [Errno 9] Bad file descriptor

Expected result: No Errors

SuperKogito commented 2 years ago

This seems related to #78 @sturivny can you maybe test this using python 3.8 or 3.7 so we can confirm if this is related to the version ?

sturivny commented 2 years ago

Python v3.8.13 and Python v3.7.13:

❯ urlchecker check . --timeout 60 --retry-count 5 --files "content/docs/hacking/contributing"
           original path: .
              final path: /home/sturivny/git-cki/documentation
               subfolder: None
                  branch: main
                 cleanup: False
              file types: ['.md', '.py']
                   files: ['content/docs/hacking/contributing']
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 5
                    save: None
                 timeout: 60
No urls found.
https://direnv.net/
https://gitlab.com/cki-project/documentation
https://documentation.internal.cki-project.org/docs/operations/onboarding/
https://gohugo.io/content-management/page-bundles/
https://gohugo.io/
https://getfedora.org/nl/workstation/
https://gitlab.com/cki-project/cki-lib
https://documentation.internal.cki-project.org/
https://gitlab.com/cki-project/pipeline-definition
https://tox.readthedocs.io/en/latest/
https://www.gnu.org/software/stow/
https://silverblue.fedoraproject.org/
https://gitlab.com/cki-project/cki-lib/-/blob/main/README.md
https://gitlab.com/cki-project/documentation/
https://cki-project.org/
https://gitlab.com/cki-project/cki-lib/-/blob/main/cki_lint.sh
2022-08-04 13:10:19,421 - urlchecker - ERROR - Error running task
ERROR:urlchecker:Error running task
🤔 There were no URLs to check.
vsoch commented 2 years ago

@sturivny can you give us enough files to reproduce the error locally? Specifically it has to do with the multiprocess running workers, e.g.,:

ERROR:urlchecker:Error running task
vsoch commented 2 years ago

And yes @SuperKogito I agree it's the same error! I don't think @crd477 ever got back to us so hopefully someone can provide enough to reproduce because there is definitely a bug!

SuperKogito commented 2 years ago

Yeah, it seems serious now that we confirmed, it is not related to the python version. No update from @crd477 but I also think that this is related to the workers. Maybe the number of workers the code is trying to use is causing an issue? or something related. Either way, hopefully we can manage to replicate this and be able to debug the error.

vsoch commented 2 years ago

hey @sturivny ! I think I was able to add a new flag to debug, --serial (to run the checks in serial) and also able to find the bug causing the above! To test, check out the branch here https://github.com/urlstechie/urlchecker-python/pull/82:

And then each of:

$ urlchecker check . --files "content/docs/hacking/contributing/documentation/index.md" --serial
$ urlchecker check . --files "content/docs/hacking/contributing/documentation/index.md" 

Let me know if the error still shows up for you, and/or if you see a new error when you add serial. Thank you for reporting this!

sturivny commented 2 years ago

Both commands works fine for me! Thank you @vsoch :smile:

❯ urlchecker check . --files "content/docs/hacking/contributing/documentation/index.md" --exclude-pattern "https://internal/documentation/repository.git"  --serial 
           original path: .
              final path: /home/sturivny/git-cki/documentation
               subfolder: None
                  branch: main
                 cleanup: False
                  serial: True
              file types: ['.md', '.py']
                   files: ['content/docs/hacking/contributing/documentation/index.md']
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: ['https://internal/documentation/repository.git']
  file patterns excluded: []
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
https://gohugo.io/
https://gitlab.com/cki-project/documentation
https://gitlab.com/cki-project/documentation/
https://www.gnu.org/software/stow/
https://gohugo.io/content-management/page-bundles/
https://cki-project.org/
https://documentation.internal.cki-project.org/
https://www.docsy.dev/

🎉 All URLS passed!
SuperKogito commented 2 years ago

@vsoch Great work as always :clap: but shouldn't only one command work correctly? O.o if both work, then this means the driver was at fault and not the multi-processing? o.o

vsoch commented 2 years ago

Multiprocessing workers can exit like that given a bug - in this case driver was None. So both work now because the bug was fixed, and the serial flag will help us to better debug this in the future!

vsoch commented 2 years ago

Fixed with #82