grafana / k6

A modern load testing tool, using Go and JavaScript - https://k6.io
GNU Affero General Public License v3.0
24.93k stars 1.23k forks source link

Expand usage collection to include extension versions in use #2952

Open javaducky opened 1 year ago

javaducky commented 1 year ago

Feature Description

Currently, k6 reports anonymous usage data as described in the docs. Thanks to #1741 in v0.43.0, we can now access version info for bundled extensions with the k6 version output. To help determine priority for certain extensions, having this version information included in the payload of the usage reports would be nice. I want to discuss bringing this additional metadata to the usage collection payload.

Of course, anonymity and the ability to opt-out are paramount. The only scenario I can think of which could break anonymity would be if someone is creating/using an extension within a private repository. Their Github organization would be "leaked," but we would not have any insight into the extension other than what could be implied by its name.

Why would we want this information? The number of extensions available is continually increasing. Many are proof-of-concept, experimental, or just "because we can." Some are hosted here in Grafana and will be officially supported, but most are not. The extensions API has had some breaking changes recently, and many extensions no longer work. By having some metrics to show which extensions are "popular," we can proactively submit patches to repository owners as changes are proposed/made to the underlying module API. We could prioritize based on the amount of usage.

imiric commented 1 year ago

I think adding this would be a far greater privacy concern than the current opt-out usage report. Remember that with the current usage report, even though the submitted data is minimal, the IP address of the submitter is still potentially leaked and logged. IP addresses are treated as personal data under GDPR, so the current opt-out telemetry we use is non-compliant in that sense.

Before even considering this, I think we should step back and have a discussion about making the report opt-in instead. Recently, there has been a lot of discussion about the proposal to add telemetry to Go. Thankfully, feedback from the community was taken into account, and this will be opt-in.

Once we make that change, then in principle, I wouldn't be opposed with expanding the data to include the used k6 extensions.

But I'll ask this: for what purpose do we realistically want to use that data, other than it "would be nice"? We should have a clear purpose for it, otherwise it will just sit there, and we'll glance at it every few months out of curiosity, like I think we do with the current usage report data.

na-- commented 1 year ago

I am on the fence about whether the benefits of adding more telemetry would outweigh the costs... I am heavily leaning towards "no", at least if we take the simple approach to just send what we show in k6 version over the network :thinking:

The only scenario I can think of which could break anonymity would be if someone is creating/using an extension within a private repository. Their Github organization would be "leaked," but we would not have any insight into the extension other than what could be implied by its name.

To increase the privacy, we can strip everything besides the last part of the module name and version :thinking: e.g. instead of github.com/grafana/xk6-browser@v0.8.1, we can just send xk6-browser@v0.8.1?

Alternatively, we can hash the module name with sha256 and send only the module hash and version over the wire :thinking: This would allow us to easily be able to get usage statistics for well-known public extensions (e.g. by hashing the module names from the list in our docs or even searching the public github for xk6 extensions), since we can calculate their hashes and compare. But it will be pretty much impossible for us to guess the name of some private extension without any extra information :thinking:

Finally, to balance out this extra addition, I suggest that we can also implement a slightly easier opting out, by following https://consoledonottrack.com/ and adding k6 to https://github.com/beatcracker/toptout (edit: or, as @imiric suggests, moving to opt-in instead of opt-out).

javaducky commented 1 year ago

@imiric, I added a "Why would we want this?" section to the original description.

the current opt-out telemetry we use is non-compliant in that sense

I agree...I'd like to see this all inverted to opt-in.

dgzlopes commented 1 year ago

Alternatively, we can hash the module name with sha256 and send only the module hash and version over the wire thinking This would allow us to easily be able to get usage statistics for well-known public extensions (e.g. by hashing the module names from the list in our docs or even searching the public github for xk6 extensions), since we can calculate their hashes and compare. But it will be pretty much impossible for us to guess the name of some private extension without any extra information thinking

I like this idea :+1:

I'm not so sure about inverting the telemetry to opt-in. Who enables telemetry? :sweat_smile:

If the current telemetry could be less intrusive, let's improve it.

eldad commented 1 year ago

I'm not so sure about inverting the telemetry to opt-in. Who enables telemetry? 😅

@dgzlopes

I just found out that k6 collects data. This is quite a bad surprise personally and I'm disappointed that this dark pattern is employed in this project. The correct approach, if you want to reach out to your user base in that way, is to ask for consent. You can ask this on the first time k6 is run and record the user choice for later runs.

You may also be found to be in violation to GDPR, if you fail to anonymize the data sufficiently since IP addresses are considered personal data.

Please respect the privacy concerns of the community by inverting this to opt-in ASAP.

dgzlopes commented 11 months ago

Hello! Sorry for the delay (I went on vacation and lost track of some GitHub notifications).

Thanks for your feedback!

At Grafana Labs, this data is one of the tools we use to improve the software we build. We collect it in a very careful way and ensure that the data is anonymous. What we collect and how we collect it is available in this section of our docs.

You can easily disable this using the --no-usage-report flag.