pulsar-edit / pulsar

A Community-led Hyper-Hackable Text Editor
https://pulsar-edit.dev
Other
3.22k stars 137 forks source link

Remove Telemetry from Core #39

Closed confused-Techie closed 2 years ago

confused-Techie commented 2 years ago

Summary

Currently there is Telemetry collected about the users and their usage of this application. As discussed in Discussion #2 this is behavior we should likely disable completely.

Motivation

We don't have the eyes nor time to even review this data if we wanted, additionally as expressed in the Discussions as a community driven volunteer project there is no reason for us to collect usage data, we aren't selling a product, and I don't think it will impact the features we work on, since again as volunteers we will likely work on the aspects we care about the most.

Additional context

I'm hoping to make this issue to get a sense if theres any obvious points I am missing here, before starting the work myself to remove the telemetry from the repo.

confused-Techie commented 2 years ago

In the interest of knowing why this should be disabled, I've gone ahead and read through the source of pulsar-edit/metrics which is what does the actual collection of telemetry. Below is a copy/paste of my findings from our Discord server.


Alright so I've dug into the source, a helpful note, the telemetry-github isn't whats actually responsible for telemetry. The metrics package uses the telemetry-github package to send the metrics it collects. Of which here is the full list:

Additionally while sending many of these metrics, others are collected, and sent with nearly all of them, which are below:

Now with all this said, largely there doesn't seem to be any items of extreme privacy violations, except maybe the atom.commands API, which on my initial inspection seems to be how many packages communicate with the editor, but I will have to double check that. But additionally I don't see how any of this data will be able to assist us.

meadowsys commented 2 years ago

The only things I see that are potentially useful are the performance related ones, like startup time and memory usage, so we can optimise and tune performance if needed, and if we kept it it should still be opt in, but otherwise yeah i dont see any use for the collected data

confused-Techie commented 2 years ago

@autumnblazey You do have a point that some could be useful, but to me the question is it useful enough to either, implement another API endpoint on our backend, and to then pay for the storage space needed, or useful enough to pay for a third party service to handle all of this for us?

Someone mentioned on the Discord server that we could create a package thats optional to install, that could then collect this data and save it locally, and only then could a user upload said data as part of a GitHub Issue, and they would have the chance to remove any PPI within it if needed.

meadowsys commented 2 years ago

Hm, that sounds a little bit more realistic (and also privacy-protecting). Even with opt-in telemetry, I still find it difficult to enable the setting, because I don't know exactly what is being sent, and that solution solves that issue too. It would be difficult for us to get an overall picture though.

Even though it may be useful, I think for now we should just remove it all, then reenable / create a package for it when the editor is in a more complete and stable state?

confused-Techie commented 2 years ago

Totally agree there. We can revisit once things are more stable, and especially once we have a functional location to even send the data. In the meantime I can get started on a package for it. Maybe the package could include more data than was available here, while removing some unneeded bits. Which might allow us to get a better picture into where the issue lies. But since it is being shared on an individual singular basis, the privacy implications are not as worrisome.

Digitalone1 commented 2 years ago

To me it's not worth adding complexity for managing this data. This is a big project and we should focus on what is really needed. Telemetry isn't one of these things.

Digitalone1 commented 2 years ago

Someone mentioned on the Discord server that we could create a package thats optional to install, that could then collect this data and save it locally, and only then could a user upload said data as part of a GitHub Issue, and they would have the chance to remove any PPI within it if needed.

This sounds more like debugging rather than collecting telemetry data.

meadowsys commented 2 years ago

Yeah, I feel the only reason we would want data from users is to debug / maybe improve stuff. None of us (that I know of?) really care about fancy graphs and numbers go brrr, especially not if it violates privacy, so if we added/kept some, it would be nothing of the sort of Microsoft or Google's data collection, that's for sure

confused-Techie commented 2 years ago

@Digitalone1 @autumnblazey Exactly, it is much more supposed to be for debugging. I don't see any reason we want telemetry, like Autumn said, we don't care about graphs, we care about privacy, and we care about the user experience. Collecting endless data, making our system more complex does none of these things. That's why I've already gotten a PR to gut the functionality completely

mauricioszabo commented 2 years ago

:clap:

Flashwalker commented 1 year ago

Why not just let the user decide if he wants to share telemetry or not.

  1. Turn off all telemetry by default.
  2. Make a "Enable telemetry" button and whoever wants to, will turn it on
Daeraxa commented 1 year ago

It is entirely possible that might come in the future if we need to and decide via vote but simply put we don't have any infrastructure in place, nor much desire to put it in place, that will collect that data for analysis. The original code for telemetry reported it to GitHub/Atom team so we would have needed to replicate their system to collect the same data and it simply was in no way a priority.

meadowsys commented 1 year ago

And of course, we dont have much desire or manpower to sift through the data, so the data would kinda be collected for nothing

mknepper commented 1 year ago

Why not just let the user decide if he wants to share telemetry or not.

  1. Turn off all telemetry by default.
  2. Make a "Enable telemetry" button and whoever wants to, will turn it on

Wouldn't it be better to not include and disallow any telemetry by default? If the user really wants to include telemetry, they could simply install a package that reports telemetry. I'm sure someone could write an installable package that would voluntarily collect telemetry and report it to remote servers for studying or development so should a user desire.

Edit: Incorrectly quoted the wrong person; fixed.

Daeraxa commented 1 year ago

Wouldn't it be better to not include and disallow any telemetry by default?

That is exactly what we have done though.

mknepper commented 1 year ago

Wouldn't it be better to not include and disallow any telemetry by default?

That is exactly what we have done though.

Correct. I just noticed a couple people suggesting to keep telemetry. I didn't mean to quote you, I apologize. I quote the wrong person. I'll edit that.

Daeraxa commented 1 year ago

No problem. Basically I think everyone was on board that it should be opt-in if we had decided to keep any but ultimately we set about stripping it out so if we did ever want to implement it in the future we would have to re-implement things but I suspect that, if put to the community vote as we always do, it would overwhelmingly come out as opt-in not opt-out.