Closed hackel closed 7 years ago
There is no generic way of doing redirects that is used by all websites. To prevent website breakage, it's better to approve and check first all needed urls and how they work, which search params they use.
It'd be nice to have one for Google outgoing URLs, though. Could we put that on the list?
Why not, please create another issue/pull request for that purpose.
It would be nice if this implemented a generic URL scraper of some kind, so that each individual site didn't have to be coded manually. Case in point, the link to this page from AMO:
https://outgoing.prod.mozaws.net/v1/d6c54b48bd1142d3dee6387e3d3feabc610d77ab48590ae0a43e6c20d93db01e/https%3A//github.com/idlewan/link_cleaner
If there was a way to recognize the actual URL here automatically, that would be wonderful. Similarly for Google:
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwiU4JDYyZvUAhUoxFQKHf5IAEAQFggsMAE&url=https%3A%2F%2Fgithub.com%2Fidlewan%2Flink_cleaner&usg=AFQjCNHLsiLWuJifp8qBynFPaicSw0gLGw&sig2=imkIeC-CN_z-8x5NgFr4TQ
I'm not sure of the best logic to avoid breakage. It's a very complicated issue. As a start, split the URL query parts, then URL decode them, then compare them against the following regex to see if they match, and if so, navigate to it instead:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
(From Appendix B in https://www.ietf.org/rfc/rfc3986.txt)Of course the first URL I gave isn't a query parameter, so if there's still no match after that, perhaps running the regex against the entire URL would also be appropriate.