rubberduck-vba / Rubberduck

Every programmer needs a rubberduck. COM add-in for the VBA & VB6 IDE (VBE).
https://rubberduckvba.com
GNU General Public License v3.0
1.91k stars 300 forks source link

Exploring Telemetry #5106

Open retailcoder opened 5 years ago

retailcoder commented 5 years ago

We have many features, some more discoverable than others. We have memory pressure and performance issues, and only a vague intuitive idea of what's in our users' VBA projects that's based on our own individual experiences. We do have logging, and it does help (a lot) with debugging and diagnosis, but statistically a bug report or log file is nothing but an anecdote.

If Rubberduck had an opt-in setting to enable transparent telemetry (there's no way this is getting implemented without making very explicit what's being sent, where, when, and how), we could collect usage data, aggregate it, and craft a lovely PowerBI dashboard and monthly reports that could shed a lot of light on many, many things.

Some ideas, for usage data:

Other ideas, for various metrics:

The storage format probably requires a number of tables. How would that be best organized?

Anonymity concerns:

No PII or otherwise sensitive information shall be collected; usercode identifier names would only be collected with explicit and specific consent; it should be impossible to look at any given telemetry record and be able to say with 100% certainty "hey that's my record!".

Consuming the data:

The entire database shall be queryable with a public REST API; monthly reports could be emailed to subscribers.


Thoughts? Ideas? Concerns? Let's discuss this inside out.

chrisdaniels commented 5 years ago

I would have no problem with this at all. If its going to help you guys in any way at all, its worth doing.

SystemsModelling commented 5 years ago

Typelibs sounds OK, just not actual lib names as some may be commercial libraries that may identify a class of user. I'd be happy to test it, as long as I can see the collected data BEFORE transmitting it.

mansellan commented 5 years ago

That's critical - the collected data should be visible on the client side at any time, in a nice human-readable format. IMO usercode, references, project and component names MUST be excluded, it shouldn't be possible to give consent for any of that.

mansellan commented 5 years ago

Thinking about consent - thinking we could add a page to the installer, giving a synopsis of what would be collected, a decription of where the on/off switch is in the main addin, and a link for further details, with options:

For the top option, the installer could omit installation of the telemetry assembly at all, which should satisfy corporates with a risk-averse posture.

retailcoder commented 5 years ago

@mansellan I like that!

retailcoder commented 5 years ago

Another idea: "Send a frown :frowning_face:" and "Send a smile :smiley:" user feedback features, like Microsoft does with e.g. Excel telemetry?

Hosch250 commented 5 years ago
  • Not right now (pre-selected)

What is the difference between that and "Disabled"? Do we prompt again week later, or something?

retailcoder commented 5 years ago

@Hosch250 that would be an installer prompt, so "disable completely" could not even install the Rubberduck.Telemetry assembly, while "not right now" would install it, but leave the setting disabled.

Hosch250 commented 5 years ago

That'd be a pain if someone toggled it to Enabled after installing with it Disabled. Alternately, would we remove the DLL if they installed in to Enabled, then toggled to Disabled?

mansellan commented 5 years ago

I think that if RD is installed under the "Disable Completely" option, the Telemetry page in the settings dialog should still be visible, but with wording like:

"Telemetry is not currently installed. If you wish to enable telemetry, please go to Control Panel, Programs and Features, then run the Rubberduck installer using Modify".

This:

  1. Provides a route to enable later, but
  2. Gives reassurance to the corporate IT reviewer that telemetry is an install-only option (which they can and will lock out)
retailcoder commented 5 years ago

Just installed Telerik Fiddler, and noticed this in the license agreement:

On startup, the Software anonymously checks for new versions; you may disable this feature if you prefer. You may opt-in to submitting anonymous data about your system configuration and use of the Software to help improve future versions of the Software. If you opt-in, Telerik may collect data related to: certain features and extensions of the Software, identifying trends and bugs, activation information, usage statistics and may track other data related to your use of the Software as further described in the most current version of Telerik’s Privacy Policy (located at: http://www.telerik.com/company/privacy-policy). You may be asked, from time to time, to respond to short survey questions presented within the Software’s user environment. Telerik may use your responses to these questions to serve you with targeted advertising content, to improve the Software, and/or for other purposes as described within the Privacy Policy. By your responding to such questions, opting-in to data collection, and/or acceptance of these terms and/or use of the Software, you authorize the collection, use and disclosure of all responses and data for the purposes provided for herein and/or in the Privacy Policy.

And this prompt on first startup:

Help Improve Progress Telerik Fiddler?

I like this approach... we'll need an explicit "privacy policy" legalese document though.

rubberduck203 commented 5 years ago

Maybe you can reach out to the Software Freedom Law Center. They offer pro-bono services for FLOSS projects. Not sure what requirements they have for determine who they’re willing to work with.

http://www.softwarefreedom.org/

zspitz commented 5 years ago

@mansellan

IMO usercode, references, project and component names MUST be excluded, it shouldn't be possible to give consent for any of that.

source

Would it be possible to differentiate between elements and libraries from "standard VBA stuff" -- such as Excel, Access, ADO, DAO, WIA, MSHTML, Regex -- and custom user projects or referenced libraries? Maybe a list of the standard ones, and any non-standard library or element (element from a non-standard library) should not be included?

mansellan commented 5 years ago

Hmm, hadn't considered that... I can't see the harm in having a library whitelist. Another option could be to hash all referenced libraries and send just the hashes, which we could then match up to hashes of known libraries. Either way, no private info is sent.

Greedquest commented 2 years ago

Rubberduck has a large presence in the VBA ecosystem, which makes data it collects a particularly useful and accurate representation of "people who use VBA and care enough about the developer experience to install extensions". Consequently, the information about who uses RD:

... and how they use VBA:

... (as opposed to information about how they use RD itself) - All this has value beyond just improving Rubberduck and can guide the design of other tools, libraries, and extensions within the VBA ecosystem. For example, I am creating an open-source VBA package manager which I feel has a similar target audience to users of RD and so I'd like to get a better understanding of that market to influence the design decisions I make.

All this is to say, I'm really in favour of this user data being gathered as it has value to both RD and a wider community.

Greedquest commented 2 years ago

As an aside... I know some software downloads like VSCode automatically detect they Operating System and bitness of the user in order to suggest an appropriate version of the installer to download (presumably this info is available to the browser). That may be a quick and dirty way to gather some demographic data at the install stage without needing to modify Rubberduck at all.

Incidentally most of the metrics I'm interested in about who uses extensions like Rubberduck are known at install time, so it might be possible to bolt a simple one time thing onto the installer rather than setting up regular reporting of telemetry data.

Greedquest commented 2 years ago

As another aside, I think this has some degree of urgency as it could help prioritise the large number of issues based on how frequently used a feature is (or underused because it is broken) and motivate decisions for new features which is always a good feeling. I cannot deny my vested interest though 😉...

A9G-Data-Droid commented 2 years ago

I am fully opposed to telemetry both personally and professionally. I dislike it in my personal life and always turn it off. At work I am required to turn off all capabilities to phone home. I would need an installer that doesn't include these features to avoid costly security review.

yuriykaz commented 10 months ago

Hm, not a good idea. In case telemetry is needed, providing 2 different installers would be good. Whoever wants to install Rubberduck with telemetry, would download it with that feature, whoever does not, would be 100% sure they are downloading a private app. Otherwise, some companies will not allow this tool anymore.

retailcoder commented 10 months ago

@yuriykaz thanks for the feedback! You're absolutely correct, this is something that needs complete transparency and has to be explicitly opt-in (as opposed to opt-out).

Telemetry isn't going to happen with 2.x, but is being seriously considered for 3.0, especially since the Language Server Protocol (LSP) defines a specific notification for this.

We're still quite far from having an installer for 3.x, but the way it's being envisioned is closer to how Visual Studio does it: you'd be installing the latest version of the Rubberduck Installer / Update Server, and that's where you'd tick a box to have the completely optional telemetry server installed along with the rest of RD3 components; the installer would only download the components that must be installed.

With the telemetry server installed, an explicit configuration will still be needed to enable telemetry events (most will be disabled by default), so nothing is transmitted without having been configured to be, and the idea is to allow all telemetry data to be reviewable before it's transmitted; transmission itself would be manual unless configured otherwise.