urlstechie / urlchecker-python

:snake: :link: Python module and client for checking URLs
https://urlchecker-python.readthedocs.io
MIT License
20 stars 13 forks source link

adding support for file patterns #47

Closed vsoch closed 3 years ago

vsoch commented 3 years ago

This PR will address #46, namely that it's currently not possible to define patterns of flies to match (e.g., dotfiles or similar). This PR will allow it to work as follows:

# Check only html files
urlchecker check --file-types *.html .

# Check hidden flies
# Note that since the shell would parse the *, we need to use quotes
urlchecker check --file-types ".*" .

# Check hidden files and html files
urlchecker check --file-types .*,*.html .

It works by way of using fnmatch, so technically any pattern glob you'd do on the command line should work! I also ran black for formatting, and updated the license dates. Once we update here, we can release and then update the action.

Signed-off-by: vsoch vsoch@users.noreply.github.com

vsoch commented 3 years ago

@SuperKogito did we never set up CI? I don't see any tests running, and we don't have a GitHub action or anything similar! It looks like we used to have travis... is it no longer working?

codecov[bot] commented 3 years ago

Codecov Report

Merging #47 (37a583f) into master (b824997) will decrease coverage by 1.11%. The diff coverage is 87.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #47      +/-   ##
==========================================
- Coverage   77.88%   76.76%   -1.12%     
==========================================
  Files          20       12       -8     
  Lines         651      383     -268     
==========================================
- Hits          507      294     -213     
+ Misses        144       89      -55     
Impacted Files Coverage Δ
urlchecker/__init__.py 100.00% <ø> (ø)
urlchecker/client/__init__.py 74.50% <ø> (-1.42%) :arrow_down:
urlchecker/core/urlmarker.py 100.00% <ø> (ø)
urlchecker/core/whitelist.py 100.00% <ø> (ø)
urlchecker/logger.py 42.85% <ø> (ø)
urlchecker/main/github.py 100.00% <ø> (ø)
urlchecker/main/utils.py 100.00% <ø> (ø)
urlchecker/core/fileproc.py 89.36% <75.00%> (-1.95%) :arrow_down:
urlchecker/client/check.py 24.59% <100.00%> (ø)
urlchecker/core/check.py 83.33% <100.00%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b824997...37a583f. Read the comment docs.

vsoch commented 3 years ago

@SuperKogito I struggled with getting codecov to work here - I'm not sure if there is an envar missing or similar. I don't think it's a huge priority or rush, but when you have the time, could you take a look/

rootwork commented 3 years ago

So, strangely, this works when there's one dotfile, but fails when there are multiple dotfiles.

Test:

  1. echo 'https://google.com' > .foo
  2. urlchecker check --file-types .* . [successfully tests google.com]
  3. echo 'https://google.com' > .bar
  4. urlchecker check --file-types .* . [Output: "Done. No urls were collected."]
vsoch commented 3 years ago

Oh strange, let me test that.

rootwork commented 3 years ago

And it looks like it's only a dotfiles thing -- I tried creating foo.html and bar.html with the same contents and it worked fine with one or both.

vsoch commented 3 years ago

Interesting, for me to get it to work (for one or both) I need a quote around the file types, like:

$ urlchecker check --file-types ".*" .
  original path: .
     final path: /tmp/test
      subfolder: None
         branch: master
        cleanup: False
     file types: ['.*']
          files: []
      print all: True
 url whitetlist: []
   url patterns: []
  file patterns: []
     force pass: False
    retry count: 2
           save: None
        timeout: 5

 /tmp/test/.foo 
 --------------
https://google.com

 /tmp/test/.bar 
 --------------
https://google.com

Done. All URLS passed.
vsoch commented 3 years ago

The reason is because argparse parses the non quoted one as just a single period (dot):

$ urlchecker check --file-types .* .
['.']
vsoch commented 3 years ago

And to go up one level, I believe the globbing is done by the shell (not the python client). So we perhaps just need to show using quotes, always.

rootwork commented 3 years ago

Yep, that makes sense, and works for me.

vsoch commented 3 years ago

Thanks @SuperKogito ! I figured that the redundant tests wouldn't hurt - if Travis ever goes away we can have it easily switch. Is it ok with you to leave as is, or would you like the github tests removed? Since they are different services they run at the same time, so it shouldn't slow anything down.

SuperKogito commented 3 years ago

I just wanted to verify if they are redundant but I totally agree, I think the redundant tests wouldn't hurt, so let's keep them.

vsoch commented 3 years ago

Ok let's merge! I'll have some time this weekend to add the other fixes we talked about to the client/spelling, and then I'll draft a new release and update the action.