muety / wakapi

📊 A minimalist, self-hosted WakaTime-compatible backend for coding statistics
https://wakapi.dev
MIT License
2.59k stars 160 forks source link

Telemetry (need your opinion!) #300

Closed muety closed 1 year ago

muety commented 2 years ago

I'd like to gather some anonymized technical statistics about self-hosted Wakapi instances to better understand how it is used. Of course, no personal data, no usernames, no IP address, etc. would be sent / stored and the feature would be ~opt-out~ opt-in.

Data I'd like to gather includes:

Would you, as an operator, be concerned to share those data in an anonymized?

YC commented 2 years ago

Normally, I do tend to switch off telemetry if given the option. I think the comments on this reddit post summarises well the concerns behind telemetry and adding telemetry, if you want to take a look.

muety commented 2 years ago

Super interesting discussion there, thanks a lot for the good read! I agree that the feature should actually be opt-in rather than opt-out and that the user should be informed in great detail about what exactly is collected. Of course, as whole of Wakapi is, the implementation would be entirely open-source and open for validation.

Further opinions?

YC commented 2 years ago

Some ideas:

We should have more discussions on this, but perhaps new configs can be opt-out, but prior users is opt-in

mawoka-myblock commented 2 years ago

My thoughts about it: All in all it's OK, BUT the data should be collected by a privacy-orientated software, NOT Google Analytics or so. The second thing:

Be transparent with collected data - perhaps some site showing collected data (will be extra work, but likely worthwhile and an incentive for users to switch on telemetry in the first place), e.g. https://data.firefox.com

That would be a nice to have but not a must for me. For me, an opt-out is also OK, but, how I said, the service you would use should be privacy-friendly.

muety commented 2 years ago

Good thoughts, thank you!

[...] perhaps some site showing collected data [...]

Should users be able to view only collected data of their own instance or would you want all telemetry data to be open? For the latter, wouldn't that concern people even more, if not only Wakapi.dev maintainers, but anyone could view telemetry data?

[...] the data should be collected by a privacy-orientated software, NOT Google Analytics [...]

This goes without saying. Data would probably be dumped into a database on the same host as wakapi.dev and I'd add some very simple and basic analysis scripts.

mawoka-myblock commented 2 years ago

If I could choose what should be tracked, I would track the following:

If you say, the number of heartbeats is very important to you for development, then ok, but I wouldn't know where this would help. The same for the other points, but if you say, they help you, I'm fine with it, but the use case isn't clear for me for these points.

muety commented 2 years ago

Total coding time wouldn't actually be of too much interest. Number of heartbeats, though, could be helpful, I think. When it comes to performance optimization (code or queries) it'd be nice to know what amounts of data the average user is dealing with. To that regard, hardware specs play a role as well.

What would be your concern sharing these information?

mawoka-myblock commented 2 years ago

Total coding time wouldn't actually be of too much interest. Number of heartbeats, though, could be helpful, I think. When it comes to performance optimization (code or queries) it'd be nice to know what amounts of data the average user is dealing with. To that regard, hardware specs play a role as well.

What would be your concern sharing these information?

No, not at all but to get the trust of the users, an explanation of why you collect what would be great!

YC commented 2 years ago

Good thoughts, thank you!

[...] perhaps some site showing collected data [...]

Should users be able to view only collected data of their own instance or would you want all telemetry data to be open? For the latter, wouldn't that concern people even more, if not only Wakapi.dev maintainers, but anyone could view telemetry data?

I don't see why it would be a concern, if you can't identify where the telemetry data came from and there's no user specific information.

mainrs commented 2 years ago

Have telemetry functions implemented in a separate file, to make it easy to audit

  • Invite discussion via PR upon implementation

This is actually important for a lot of distros as they patch out telemetry altogether. If most of the code is inside a single file and has minimal binding code to the core it makes it easier to patch and maintain.

boehs commented 2 years ago

Super interesting discussion there, thanks a lot for the good read! I agree that the feature should actually be opt-in rather than opt-out and that the user should be informed in great detail about what exactly is collected. Of course, as whole of Wakapi is, the implementation would be entirely open-source and open for validation.

Further opinions?

I have no problem with opt in!

If it was opt out by default I would opt in, if it was opt in by default I would opt out

Should users be able to view only collected data of their own instance or would you want all telemetry data to be open? For the latter, wouldn't that concern people even more, if not only Wakapi.dev maintainers, but anyone could view telemetry data?

There is a certain secrecy to collecting telemetry and then hiding it away. Statistics interest people when it's not invasive, and your suggested usecases are not invasive.

MeerBiene commented 2 years ago

image

It would be decent to get collective stats for users that opt in to share their total coding time

solonovamax commented 2 years ago

To reiterate the points that have already been said:

Honestly, it being opt-in is super important to me. Not only should it be opt-in for the instance itself, it should also be opt-in for each user. (Present the users with a dialogue when signing up, asking if they wish to enable it.)

Further, the following should be shown to the user/the person hosting the instance:

As for what is reported to the central instance, I think the following would be reasonable

mawoka-myblock commented 2 years ago

I disagree. I think that controlling every point by hand is unnecessary work (For @muety). I also think that users don't need to be able to decide on their own, since the total coding time is already shown on the landing page. What I would do, to address this concern, is, I would show an info on the register page or something, where it tells the user whether telemetry is enabled or not. You should also show it in the bottom-left hand corner, next to the version and the database-driver.

solonovamax commented 2 years ago

I disagree. I think that controlling every point by hand is unnecessary work (For @muety). I also think that users don't need to be able to decide on their own, since the total coding time is already shown on the landing page. What I would do, to address this concern, is, I would show an info on the register page or something, where it tells the user whether telemetry is enabled or not. You should also show it in the bottom-left hand corner, next to the version and the database-driver.

I believe users should be able to sign up to any instance of their choice and disable tracking if they so choose.

muety commented 2 years ago

Thanks for your opinion and the elaborate write-up!

To clarify, telemetry is not at all about user tracking. And, as already discussed earlier, no data will ever be included, that is attributable to individual users. In fact, no actual "content" will be sent at all, but only aggregated meta data instead.

When talking about user tracking (using tools like Google Analytics, Mixpanel, Matomo, etc.), I agree with @solonovamax, that users should have full control. There are no plans for Wakapi to employ such tools, though. Telemetry is much different from user tracking, and frankly, I've never seen a software project, where a user is given choices about what telemetry data the server instance reports.

solonovamax commented 2 years ago

Fair point.

Thinking about it a bit more, I believe I may have misinterpreted the scope of what you were proposing.

In which case, I would agree with @mawoka-myblock,

What I would do, to address this concern, is, I would show an info on the register page or something, where it tells the user whether telemetry is enabled or not. You should also show it in the bottom-left hand corner, next to the version and the database-driver.

mawoka-myblock commented 2 years ago

Some news, @muety ?

tgrrr commented 2 years ago

If you go ahead with this, can you be uber-transparent about what is collected thanks. Here's a good example of how to do it: https://www.plex.tv/en-au/about/privacy-legal/privacy-preferences/

Secondly, there should be a big delete my data button included in the settings, especially to deal with GPDR.

Also, if it's possible, then it would be great if the option was included in the settings page, and also the config file. If the user opts out of telemetry in the settings OR dashboard, then it should be off.

[settings]

# Your Wakapi server URL or 'https://wakapi.dev' when using the cloud server
api_url = http://localhost:3000/api/heartbeat

# Your Wakapi API key (get it from the web interface after having created an account)
api_key = 406fe41f-6d69-4183-a4cc-121e0c524c2b

# Telemetry - data collected by wakapi, see how we use the data at https://wakatime.com/privacy
telemetry = on
luckydonald commented 2 years ago

To be fair, I think having the resulting anonymized data served publicy is a good idea. Right now the only argument I seen against it was because people were feared of it exposing data. Which would mean the tracking itself already is sharing data which you would be uncomfortable if you were to know about people using that data.

One project with a similar thought process is https://www.offen.dev, which allows the user to see their collected data, and delete it at any time.

NatoBoram commented 1 year ago

For an example of public telemetry, there's the Minecraft server implementation Paper that uses bStats

https://bstats.org/plugin/server-implementation/Paper

bStats is a Java library for Minecraft server plugins that tracks when the server is online and send the host's config.

Showing these stats can give more trust in the telemetry. It also creates an implicit understanding between the maintainer and its userbase: If you can't display that stat, then don't track it.

muety commented 1 year ago

Won't implement telemetry.