victorpopkov / cask-tools

[in migration from Shell to Go] Collection of tools to help maintain the Homebrew-Cask project.
MIT License
8 stars 2 forks source link

New cask-scripts suggestion #1

Closed miccal closed 7 years ago

miccal commented 8 years ago

Hello Victor,

First off, thanks for your very handy cask-appcast and cask-check-updates scripts.

I have an idea for another script to check for dead Casks based on the idea that if the homepage is dead, then the Cask probably is also. I used this in a PR to Hombrew-Cask here.

I do not have the coding know-how to create a self-contained script like yours, so I thought it might be something you would be interested in creating.

Basically, I have the following simple url_check.sh script:

while read -r LINE; do
      read -r REP < <(exec curl -sSI "$LINE" | head -n 1)
      echo "$LINE: $REP"
done < "$1"

and I run the command

brew cask _stanza homepage > url_list.txt ; 
sed -i '' 's/https/http/g' url_list.txt ; 
bash url_check.sh url_list.txt > url_check.txt ; 
grep -v 'HTTP' url_check.txt > dead_url.txt

which (eventually) outputs a list of (potentially) dead homepage url's for each Cask to the file dead_url.txt. The command brew cask _stanza homepage > url_list.txt lists the homepage stanza of every Cask in the file url_list.txt, for example:

http://www.sweetscape.com/
https://play0ad.com/
https://mycodingsucks.com
http://www.suavetech.com/0xed/
https://pc.115.com/
http://1clipboard.io/

which are the first five url's from the Casks, and I added the fake url https://mycodingsucks.com. The command sed -i '' 's/https/http/g' url_list.txt replaces any https's to http (so curl doesn't complain about authentication), like so:

http://www.sweetscape.com/
http://play0ad.com/
http://mycodingsucks.com
http://www.suavetech.com/0xed/
http://pc.115.com/
http://1clipboard.io/

The script bash url_check.sh url_list.txt > url_check.txt then checks each url, and outputs the following file:

http://www.sweetscape.com/: HTTP/1.1 200 OK
http://play0ad.com/: HTTP/1.1 301 Moved Permanently
http://mycodingsucks.com: 
http://www.suavetech.com/0xed/: HTTP/1.1 200 OK
http://pc.115.com/: HTTP/1.1 200 OK
http://1clipboard.io/: HTTP/1.1 200 OK

and also outputs curl: (6) Could not resolve host: mycodingsucks.com to the terminal. Finally, the command grep -v 'HTTP' url_check.txt > dead_url.txt outputs each line of url_check.txt that did not return the first line of the command curl -sSI "URL" | head -n 1 (implying an error) to the file dead_url.txt, which is the required list of potentially dead Cask homepage url's, like so:

http://mycodingsucks.com:

Using this in my PR above, I obtained a list of approximately 25 url's, which I then went through and found that 20 actually corresponded to dead Casks (which I removed in 2 separate PR's), and about 5 that returned an error as curl requests were not allowed.

Now, what would be great is if it could output a file with the Cask name and homepage, much like your outdated.csv list, but I have no idea how to do that.

This is certainly not something that would run every day like your cask-check-updates script, but maybe once a week or something?

Apologies for the long post, and please feel free to ignore it if you are not interested.

Thanks!

victorpopkov commented 8 years ago

Thank you @miccal for your suggestion and a very informative description with a great example. I really appreciate that!

This script will definitely be really useful and I will surely add it to the repository as soon as possible with your suggested behaviour and configure the server to run it every week.

You are awesome! 👍

miccal commented 8 years ago

No problem, I just wish I had the know-how to do it myself - my bash and ruby coding skills are quite amateur. I do most of my coding in Maple and Matlab!

victorpopkov commented 7 years ago

Script cask-homepage has been added, that includes suggested behaviour. Basically, it outputs a list of all homepage URLs that might have some issues. The ones that will have an 'error' status will be the (potentially) dead casks.

The script can automatically fix homepages, but I strongly recommend to fix only those casks which definitely need some fixes by specifying casks to check.

Thanks again for your suggestion.

miccal commented 7 years ago

Wow thanks Victor, that is fantastic - you made my simple bash script look extremely amateur!

I also had the idea of listing the possible redirects, but the https availability and trailing slashes I had not thought of.

I notice that your list is quite long - I will leave to to you to submit the fixes.

In your doc/cask-homepage.md and bin/cask-homepage files there are a few instances of slush instead of slash.

I hope yourself and the other Cask maintainers find it useful.

victorpopkov commented 7 years ago

Thank you for noticing the slush typo. Fixed now. 👍

Yes, at the moment the list is definitely long since the script is yet immature and requires a bunch of improvements and bugs fixing. Soon the generated list will become much shorter after making few tweaks with redirects reporting since a lot of those are not necessary.

Thanks again for your suggestion, I really appreciate that!