milesmcc / librenews

A free and open breaking news notification platform
https://librenews.io
GNU General Public License v3.0
88 stars 13 forks source link

Remove redirects and tracking parameters from the urls #11

Open kisst opened 7 years ago

kisst commented 7 years ago

Single page visit goes across 2 redirect

curl -s -L -I $(curl -s https://librenews.io/api  | jq -r .latest[0].link) | grep -c ^Location
2

but even after that is filled up with an other 4 tracking parameters

curl -s -L -I $(curl -s https://librenews.io/api  | jq -r .latest[0].link) | grep "^Location" | tail -n 1 | cut -f2- -d? | sed -e $'s/&/\\\n/g' | wc -l
4

receive direct links without redirects and tracking parameters in the links

milesmcc commented 7 years ago

This would require some complex application logic, as it would involve the server having to de-mask the "tracker" urls. Currently, the LibreNews Server serves URLs just as they are provided by BBC: shortened.

Is this objectionable?

kisst commented 7 years ago

Objectionable maybe a strong word, but if with a little extra work, we can further reduce the user tracking, then why not.

I'm not sure how complex is it, most of the time this would work, at least with well behaved urls.

r = requests.get(url)
re_url = re.search('[^?]*', r.url )
clean_url = re_url.group(0)

I tested with all url-s in the feed, worked just fine, but I guess due to the traffic minimisation like in #5, this should be optional, at the same time the client side implementation of #6 would be a bigger help to reduce data.