jupyterhub / binderhub

Run your code in the cloud, with technology so advanced, it feels like magic!
https://binderhub.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.54k stars 388 forks source link

My binder badge fires tracking cookies #379

Closed Titan-C closed 6 years ago

Titan-C commented 6 years ago

We recently incorporated mybinder badges into https://github.com/sphinx-gallery/sphinx-gallery. This is a great enhancement for us. But at the same time I noticed that the badges would not load on my browser. The reason is I use privacy badger, which blocks domains that are tracking me, and mybinder behaves like an evil tracker which got blocked.

Do you really need all this aggressive tracking of your users installed badges?

betatim commented 6 years ago

How could I test this? I just installed privacy badger and visited https://github.com/scikit-optimize/scikit-optimize which has a binder badge. As far as I can tell there was no alarm from the extension (green zero on the badger logo). Same for visiting the badge image.

Do I have to somehow configure the extension? (Only just installed it)

We have GA installed on mybinder.org but if you don't visit it you should be fine.

Titan-C commented 6 years ago

Yes you can test. We integrated a binder badge for every example in in Sphinx-Gallery docs. Take as an example: https://sphinx-gallery.github.io/auto_examples/plot_seaborn.html#sphx-glr-auto-examples-plot-seaborn-py Which is what I add from the screenshot.

image877

You will see that privacy Badger says there is a potential tracker and associates it to mybinder.org, thus it blocks it. I don't know why it shows up as a tracker in our docs website, but it doesn't say that it is a tracker when the page is on github(at least from the scikit-optimize website you just showed)

We have GA installed on mybinder.org but if you don't visit it you should be fine.

Indeed, you should not be tracked unless you visit the website. Also from what I see in privacy badger is mybinder.org that is considered the third party tracker not Google analytics

betatim commented 6 years ago

Some ideas why this happens:

We have some custom template that tries to respect the DNT preference of the user. So in principle you should not end up with a cookie if DNT is enabled. However, in the webdev console the requests I send when visiting mybinder.org have a DNT: 1 header (even though in my FF preferences I did not enable DNT, "Send websites a “Do Not Track” signal that you don’t want to be tracked" is set to "Only when using Tracking Protection").

We use navigator.doNotTrack to check if DNT was enabled by the user. If I enable DNT explicitly then this is set to "1", if I only send the DNT header then this is undefined. The MDN docs make me believe that it should be set to "1" when the DNT header is set to 1.

Action: should we check the request headers when deciding to include the GA JS?


For the scikit-optimize example I linked the badge doesn't actually get served by mybinder.org but by github.com. Which is why there is no tracking alert.

betatim commented 6 years ago

Thanks for reporting this because we do care about respecting people's privacy. Which is why we exclude the GA JS when we think the user doesn't want to get tracked but apparently the check is incomplete.

choldgraf commented 6 years ago

Action: should we check the request headers when deciding to include the GA JS?

Is this the main thing to be done here? Agree we should be respecting whatever privacy levels people specify

yuvipanda commented 6 years ago

Thank you for reporting it!

I think the core problem here is that because mybinder.org is our website and the image is also being served from there, any cookies we set (now or in the future) will be sent along to image requests (if I understand it correctly) - making it 'tracking'.

I think the 'real' fix for this is to use a different domain for just serving the image. This should be fairly easy to set up, and should prevent any cookies we set to mybinder.org from being sent when requesting the image.

We can do this in mybinder.org-deploy by:

  1. Adding an additional domain that just points to binderhub (static.mybinder.org, say)
  2. Change the code that generates badges to point to images from static.mybinder.org rather than mybinder.org
  3. (Optional) redirect mybinder.org/badge.svg to static.mybinder.org/badge.svg (if we want), but not required for original work.

This should separate our badge images from mybinder.org enough to satisfy all privacy-enabling addons I'm aware of. Note that switching to shields.io doesn't solve any of these problems, since they can already track you even more efficiently than we can :)

Long term I'd also like to move away from Google Analytics to a self hosted Piwik, but that's not anytime soon.

As others have said, we take privacy seriously. Thank you for reporting this, @Titan-C!

yuvipanda commented 6 years ago

@Titan-C you can use badge from URL static.mybinder.org/badge.svg (instead of mybinder.org/badge.svg), it should prevent tracking cookies from being sent.

yuvipanda commented 6 years ago

I made https://github.com/sphinx-gallery/sphinx-gallery/pull/334

choldgraf commented 6 years ago

is there a way for us to direct /badge.svg to static.mybinder.org/badge.svg?

yuvipanda commented 6 years ago

@choldgraf probably, but that won't actually solve the privacy issue (since cookies will still be sent with the original request). I'll open another issue to change our badge generation code to use static.mybinder.org

choldgraf commented 6 years ago

ah ok, sounds good...I'll also open an issue so that we add this to the documentation.

yuvipanda commented 6 years ago

https://github.com/jupyterhub/binderhub/issues/392 for changing our UI

Titan-C commented 6 years ago

@Titan-C you can use badge from URL static.mybinder.org/badge.svg (instead of mybinder.org/badge.svg), it should prevent tracking cookies from being sent.

Sorry it does not seem to be the case. Privacy badger still says is a potential tracker

yuvipanda commented 6 years ago

@minrk pointed out that the GA cookies on mybinder.org are set on *.mybinder.org, which would affect this. https://github.com/jupyterhub/binderhub/pull/396 should help fix that.

yuvipanda commented 6 years ago

We've fixed and deployed this now, and the patch in spinhx-gallery has also been merged!

Thanks a lot for reporting and engaging with us on this, @Titan-C!

Titan-C commented 6 years ago

I opened the issue https://github.com/EFForg/privacybadger/issues/1827 with privacy badger about tracking domains remaining blacklisted even after stopping tracking. It is an issue with their software, but you can get into a permanent whitelist by following EFF Do Not Track policy also discussed in https://github.com/EFForg/privacybadger/issues/882

yuvipanda commented 6 years ago

Thanks for caring, opening this and following through with it, @Titan-C! @willingc has opened https://github.com/jupyterhub/mybinder.org-deploy/issues/278 to figure out us following the EFF's DNT policy on mybinder.org.