Feature request: anonymous telemetry

qdm12 / gluetun

VPN client in a thin Docker container for multiple VPN providers, written in Go, and using OpenVPN or Wireguard, DNS over TLS, with a few proxy servers built-in.

https://hub.docker.com/r/qmcgaw/gluetun

MIT License

7.32k stars 347 forks source link

Feature request: anonymous telemetry #2001

Open qdm12 opened 8 months ago

qdm12 commented 8 months ago

What's the feature 🧐

Generate random id and store it in /gluetun
Send id to qqq.ninja/gluetun over https when tunnel is up the first time only
Optional environment variable TELEMETRY_EMAIL users can set for me to create a mailing list

Since this doesn't affect performance nor anonymity, this will be always enabled. It is used to measure how many active users there are, and may be extended in the future.

EDIT post discussion:

Always disable for the custom provider to not leak the VPN server Ip address, which might be a personal one or company one.

Extra information and references

No response

Gylesie commented 8 months ago

I would like to provide feedback. In my opinion having any kind of tracking is unacceptable for project such as this. I hope this feature can be turned off via ENV variable.

Stetsed commented 8 months ago

So while I do understand why you want telemtry, I do not think it should be opt-out and should instead be opt-in. This project can contain sensitive information as it reveals who is using Gluetun and as such who might/is using a VPN of some kind. As such I think this should be opt-in, however it can be in the standard docker-compose/documentation. However it should be a clear enviroment variable like ENABLE_TELEMTRY, which by default is false unless specified to be true.

qdm12 commented 8 months ago

@Gylesie @Stetsed I was expecting some users to complain, but what sensitive information does it leak? I understand we've been terrorized with for example Microsoft using telemetry as a disguise for spyware, but please read this issue again, you are misunderstanding.

What sensitive information would be leaked!? Especially since the code is open source, you can clearly verify what is sent. As it is mentioned it would send a randomly generated id to my own server, to have an idea how many active users there are. The later step would be to, for example, add the Gluetun version number (to have a count of users per Gluetun version). The worst thing this could leak would be the VPN server IP address, but I really don't care about the VPN server IP address reaching my server, and I cannot understand why one user may care especially since it's a random id associated with it, it's really anonymous.

And no, I am not evil enough to send all your private keys to my server through telemetry 😆 If I would be evil, I would not even have created this issue, anyways.

Unless you have a valid argument against, it will stay as enabled by default.

Stetsed commented 8 months ago

@Gylesie @Stetsed I was expecting some users to complain, but what sensitive information does it leak? I understand we've been terrorized with for example Microsoft using telemetry as a disguise for spyware, but please read this issue again, you are misunderstanding.

What sensitive information would be leaked!? Especially since the code is open source, you can clearly verify what is sent. As it is mentioned it would send a randomly generated id to my own server, to have an idea how many active users there are. The later step would be to, for example, add the Gluetun version number (to have a count of users per Gluetun version). The worst thing this could leak would be the VPN server IP address, but I really don't care about the VPN server IP address reaching my server, and I cannot understand why one user may care especially since it's a random id associated with it, it's really anonymous.

And no, I am not evil enough to send all your private keys to my server through telemetry 😆 If I would be evil, I would not even have created this issue, anyways.

Unless you have a valid argument against, it will stay as enabled by default.

I assumed you were gonna send the ping that it’s being used over the clear net(as in not over the VPN) instead of over the VPN tunnel and then send it as you didn’t make this clear in the original post, you just said when the tunnel is first established not anything to do with how it would reach the site besides HTTP(s).

Also even if it’s anonymous(which it’s not, nothing is ever truly anonymous as if they somehow get a users “anonymous” id then they can see the IP’s they’ve pinged for if they somehow get acces to your DB and as such what VPN servers you have used assuming you store that, and assuming it sends it every time the container restarts and connects to the tunnel as again you haven’t made it clear if it’s 1 per tunnel up, or 1 per time the container is setup, or 1 time per install),, it should still be clear that it exists, you can have it be opt out I don’t agree with that but that’s your choice, but it should be in the default compose to be able to turn it off(it’s commented out by default but people can uncomment it to turn it off). And it most certainly shouldn’t be always enabled with no way to disable it as that’s how I seemed to interpreted your last sentence in the Original post

qdm12 commented 8 months ago

I assumed you were gonna send the ping that it’s being used over the clear net(as in not over the VPN) instead of over the VPN tunnel and then send it as you didn’t make this clear in the original post, you just said when the tunnel is first established not anything to do with how it would reach the site besides HTTP(s).

Gluetun is designed to let zero traffic outside the VPN (except through your Docker bridge local network and other LANs if configured to do so), so this wouldn't be possible anyway 😉

such what VPN servers you have used

Not really "you", since it's a random id generated.

The data I'm interested in would be

random id 1 <-> Current gluetun version
random id 2 <-> Current gluetun version
...

If I would be greedy evil, it could be at worst:

random id 1, connection time 1, vpn server ip address 1,gluetun version
random id 1, connection time 2, vpn server ip address 2,gluetun version

And even then, this still is anonymous to me, especially since VPN servers are used by many users. HOWEVER I will disable it for the custom provider since the VPN server IP address might give away some data on your own VPN server, and even if I won't collect VPN server IP addresses, I understand you don't want to take the risk, that makes sense.

you haven’t made it clear if it’s 1 per tunnel up, or 1 per time the container is setup, or 1 time per install

"Send id to qqq.ninja/gluetun over https when tunnel is up the first time only" is clear to me: once on every container start on the first tunnel up (not subsequent reconnect).

And it most certainly shouldn’t be always enabled with no way to disable it as that’s how I seemed to interpreted your last sentence

Yes that's still my idea (except always disabled for the custom provider), unless there is a fair point that I still can't see. Obviously if I add more intrusive telemetry it will be disabled by default, and opt-in. For example your original IP address (although not possible the way it's designed for now) would 100% be opt-in (anyway that would be mega weird to collect this 😆)

Stetsed commented 8 months ago

@qdm12 I understand where your coming from but if you do not allow for disabling it, without informing the user it exists and them accepting it, you could possibly be in violation of the European GDPR, as it's not anonymous data(unless that random ID is regenerated every time the container starts up, if it's not then it would be seen as a permanent identifier like a cookie).

I am not saying this to be like "I'm gonna sue you", but more out of a it's something you should be aware of. And even if it's not having a form of telemtry is fine, but having a form of telemtry that could technically be used to unmask users activity(which could happen if somebody acces your server and starts collecting IP's <-> ID combos, and gets a users ID from there side) should even if it's not legally required have a way to disable it, besides forcing a user who wants to opt-out to fork it, build it themselves and then deploy it.

rightsaidfred99 commented 8 months ago

I agree, it would indeed violate european gdpr

Under the GDPR, a unique identifier for customers, such as an ID number or username, can be considered personal data if it can directly or indirectly identify an individual. Therefore, if the unique ID used by the software can be linked to a specific person, it would be subject to the GDPR's requirements.

To comply with the GDPR, organizations processing personal data, including unique IDs, should ensure they have a lawful basis for processing, such as obtaining explicit consent from the individuals or demonstrating legitimate interests.

qdm12 commented 8 months ago

Not trying to argue, just trying to understand if you don't mind 😉

if it can directly or indirectly identify an individual Therefore, if the unique ID used by the software can be linked to a specific person

In this case, the ID is for a specific machine since it would be persisted, but it cannot be linked to a specific person directly or indirectly at all. So I don't think this would contradict GDPR? 🤔 Even then, let's say this ID would be generated on every container start (so on every request sent out), would that help and fit your expectations for something on by default?

Apart from the GDPR point of view, please realize that any VPN provider can easily identify you and can very much spy on you as well, and, trust-wise, if I would be a random user of Gluetun, I would rather trust Gluetun's author (OSS, you can check what is sent in the code) than a VPN provider (who could spy on everything possibly) 😉 Although yes, you do need to trust here and there to accomplish anything at the end of the day. All this paragraph being invalid for the custom provider if you own your VPN server, another reason why this telemetry would be off in this case.

EDIT: I wish you all a happy new year 🎉

eiqnepm commented 8 months ago

It's not always about whether something can actually be used to track individual users, it's about user trust.

While I believe the likelihood for this to be abused is incredibly small and you may feel it's unreasonable for people to consider this a privacy or security issue, people still will, including myself.

I like to follow the principle of least privilege as much as I can. If something is not required for my desired use of a program to function, I should ideally be able to disable it.

Maintaining trust with your user base is important. Brave for example has an option to disable "Automatically send a daily usage ping to Brave". While a completely different application, the principle is the same. It's probably not likely it could be used to track individual users, but they understand that it should be the users choice nevertheless.

I don't see an issue with implementing this and having it enabled by default, however having an option to disable it is crucial for a user's freedom to choose.