napari / napari

napari: a fast, interactive, multi-dimensional image viewer for python
https://napari.org
BSD 3-Clause "New" or "Revised" License
2.17k stars 418 forks source link

auto bug reporting? #621

Open tlambert03 opened 4 years ago

tlambert03 commented 4 years ago

❓ Question

Just curious whether you guys have discussed using exception monitoring services, for auto-collecting napari bug reports & stack traces. I have been using sentry with LLSpy and FPbase (it's free for open source) and it's been super helpful to catch bugs that users would sometimes not even notice (or otherwise not report), and to identify how many users are experiencing the issue to help in triage. It would of course be an opt-in checkbox somewhere... but with the growing user base, it might be helpful?

AhmetCanSolak commented 4 years ago

Sentry is great.

I love to hear what others feel on having such feature, even as an opt-in could bring some concerns.

jni commented 4 years ago

Hi Talley,

We've actually had a conversation about sentry already. The short of it is that we actually have to be even more careful than most with user data collection, because of the perception of napari as coming from Chan Zuckerberg, and all of the surveillance concerns that that entails. There was massive uproar on the ImageJ list when Fiji proposed automatic tracking (I can't find it in the archive now, though, maybe @ctrueden has the link handy), and just now, literally as I was writing this response, I got this email from Gitlab apologising to users for proposing user tracking, after presumably a similar uproar.

So, generally, I agree that it's super super useful, but we might have to play with one hand tied behind our backs a bit, and generally be extremely careful here, both with what we collect (e.g. I don't want file paths in the reports, as they could contain sensitive information), and how we enable collection: probably not at all in the napari package itself, but rather, requiring that users install a completely separate package, e.g. napari-sentry. ie you should be able to opt out of even having sentry-related code on your machine.

tlambert03 commented 4 years ago

Yep, makes lots of sense. I hadn’t considered the completely separate package idea... that’s a good one

No need to keep this open, so I’ll close

ctrueden commented 4 years ago

Starts here:

Click through the "By Topic" Next links to read it all.

And then it continues here:

Similarly, click through the Next links to see all replies.

It was brutal. People are crazy about this. Be very careful.

What really gets my goat, though, is the double standard with web software (all web services inherently violate "privacy" because the API request is a phone home). And also with software that does not bother to disclose when it phones home (e.g. Icy).

tlambert03 commented 4 years ago

yeah, I think that idea is shelved :)

was the program collecting info by default in that case though?

ctrueden commented 4 years ago

was the program collecting info by default in that case though?

Initially, yes. But when we tried to react to community feedback by updating the usage statistics to be opt-in, there continued to be a strong rejection:

https://list.nih.gov/cgi-bin/wa.exe?A2=IMAGEJ;d8dc8355.1408

tlambert03 commented 4 years ago

yeah, quite a response! seems like they all latched on to the concept of collecting usage statistics (to support grant requests, etc...). Too bad that gets lumped in with sending a stack trace when an exception gets thrown. But I get it: it's all way too hot to touch. not worth it.

sofroniewn commented 4 years ago

I'm reopening this now, because it has come up in a couple of recent conversations I've had, and because we are getting closer to releasing a bundled app where the expectations around telemetry may be different then a pip installed package.

For example @0x00b1 recently informed me that CellProfiler has been collecting telemetry data with the sentry.io platform since 2018 and had very good opt-ins (see dialog here). They've also found this data very useful - see this blog post from their team for example.

One I think that is very important is that if we do collect telemetry data is that we publish blog posts (or similar) explaining how we're using it and making it clear how that data is making napari and the napari community stronger.

As noted above, there are different types of telemetry data and different usages such as getting usage statistics and patterns, bug reporting etc. - but there are significant concerns around privacy, what data gets sent back, how does it get shared etc. Here we want to learn from and listen to those around us and our users/ plugin developers and take a cautious approach so as to build trust.

As @jni points out above, one approach is having all the telemetry code isolated in a separate package - say called napari-sentry if we go with the sentry.io platform.

If you choose to do

pip install napari
pip install napari-sentry

then you will get the telemetry code, if do not you will not have the telemetry code.

Even after telemetry code has been installed there can still be an explicit opt-in dialog in our GUI that turns on the telemetry (potentially to different levels, i.e. some usage statistics but no bug reports or visa-versa). The telemetry could be disabled at any time, and either uninstalled entirely / reinstalled at will from a preferences dialog.

If we took this approach, we could then discuss things like - would we ever make napari depend on napari-sentry, would we ever ship it in the bundled app, would we display the opt-in message on the first launch from a pip install, or from the first launch of a bundled app?

We'd also have to think about exactly what we collect/ where it goes/ who has access to it.

While I certainly don't want to rush this discussion, if we did want to eventually include telemetry in our bundled app I can see many reasons for having it ready to go for the very first release of our bundled app as it sets expectations very clearly with our users and doesn't then come as this "surprise" later on.

Looking forward to more discussion!

jni commented 4 years ago

We'd also have to think about exactly what we collect/ where it goes/ who has access to it.

A lot of the discussion around COVID data collection makes me think that, ideally, we should be able to make collected data public. If we can't because of privacy concerns, then that's a strong indication that we are collecting too much. At the very least, I think this should apply to anything collected that doesn't involve the user pressing a "send data" button, e.g. number of launches of the app.

jni commented 4 years ago

Some vague notes from the meeting: