hanover-computing / canonicize-url

Get a stable, canonical version of any URL, with DNS and HTTPS checks, redirects, tracker stripping, and canonical link extraction!
GNU Lesser General Public License v3.0
12 stars 0 forks source link

Try to extract links from querystrings and URL fragments (e.g. postmark has pst.mk/$theactuallink/blahblahblah) #13

Open JaneJeon opened 2 years ago

JaneJeon commented 2 years ago

This is a long shot, but for links that don’t automatically redirect you to its destination (think YouTube links) AND don’t get caught by ClearURLs filter, it would be swell to find out where the ultimate destination is, and often times it’s done by embedding the actual link within the querystring.

Using something like this https://github.com/sindresorhus/get-urls/blob/main/index.js#L5 and https://github.com/niftylettuce/url-regex-safe we could extract URLs from querystrings.

The question is, then what? What do we treat it as? A candidate? What if it’s something completely unrelated? How do we know when it is a “candidate” and if so, how do we know if this candidate fits better than others?