Open coderobe opened 4 years ago
Yes, indeed! This looks pretty awesome, thank you!
I will try to implement something based on this when I have some time for it. Already have an idea that should work pretty well.
Great!
Annotated code, if you're interested about the inner workings:
for dl in $(
for url in $(
(
echo '['; # prepend '[', this is needed for jq because we're extracting an incomplete json array
curl -sSL https://support.microsoft.com/en-us/help/4023482/surface-download-drivers-and-firmware # get the list of downloads, the content of this page is actually contained in a json structure in an inline <script>, which is annoying
| hq script data # extract all script tags
| sed -n '/prefetchedArticle /{:a;n;p;ba;}' # get all lines following `prefetchedArticle`, which contains the real page html in JSON
| sed -n '/"body":/{:a;n;p;ba;}' # extract everything following the declaration of the `body` element in the JSON - which is our article content as JSON strings
)
| jq -r '.[]|.content|.[]' 2>/dev/null # get all of these content elements as raw strings from the JSON (this is now HTML)
| hq '.faq-panel-body > p > a' attr href # extract all relevant HREFs
| grep www.microsoft # filter for microsoft URLs
) # this yields URLS like 'https://www.microsoft.com/download/details.aspx?id=57514'
do
curl -sSL $(sed -re 's/details/confirmation/' <<< $url) # curl the URL, replacing details.aspx with confirmation.aspx which is the real download dialog page after all redirects
| hq a.failoverLink attr href # get the failover link HREFs (which are still not direct links, but rather link to another sub-page)
done
| grep '^\/' # filter for HREFs starting with a `/` (/en-us/download/confirmation.aspx?...), as there are other garbage URLs in here sometimes
)
do
curl -sSL "https://www.microsoft.com/$dl" # curl the real download confirmation page
| hq a.mscom-link attr href # extract the fallback links
done
| grep download.microsoft # filter for https://download.microsoft.com/ as there are irrelevant links in here too
| grep '.msi$' # filter for links ending in `.msi`
| grep 'Surface[^ ]' # filter for links matching `Surface[^ ]` which prevents this from containing links to `/Wintab_x32_1.0.0.20.zip` and whatnot
How about this: (requires
curl
sed
grep
jq
hq
, these are all available as official packages on Arch Linux and quite possibly other distributions)That yields:
Are those the URLs you're looking for? ;)