mozilla / data-review

Templates for Firefox data collection review process (https://wiki.mozilla.org/Firefox/Data_Collection)
Mozilla Public License 2.0
36 stars 100 forks source link

The form is too long and discouraging me from adding telemetry probes #2

Open johannhof opened 6 years ago

johannhof commented 6 years ago

tl;dr This form imposes on everybody the burden of the process necessary for the most dramatic cases, which adds more work to an already complicated procedure.

First of all, I'm proud that we at Mozilla have managed to create an internal culture of seeing user privacy protection as one of our central principles (and in fact my job mostly revolves around protecting user privacy in Firefox). Thank you for your work on data stewardship.

Adding a telemetry probe has never been quick and easy. The simple technical limitation that it can't be done using artifact builds, the (rightfully) short expiration time and the data steward review process have provided some necessary overhead that I anecdotally know have discouraged the addition of some "trivial" telemetry probes so far.

This form complicates things even further.

I know that the intention behind this is good (if I understand it correctly it's intended to be like the uplift request comment template on Bugzilla). In fact I was always a bit uncertain how to properly request data-review, so I can get fully behind a more formalized process.

But the questionnaire in its size and tone (and the fact that it's not a Bugzilla comment template) makes me urgently want to do something other than add a new telemetry probe to Firefox.

Excerpts I'm skeptical about (note that these complaints are specifically about adding telemetry to Firefox, this looks like it could be used for other things as well):

All questions are mandatory.

Doesn't set a great mood. "Please fill out all questions"?

  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?

For harmless data, this question feels inappropriate. This question should only be asked when the data we are collecting is in fact category 3 or 4 data. For other categories the honest and correct answer to this is "we didn't consider alternative methods, why should we?". This kind of question should be left to the data reviewer, IMO.

  1. Can current instrumentation answer these questions?

I probably just misunderstand, but what's the difference to number 3?

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories on the found on the Mozilla wiki. Measurement Description | Data Collection Category | Tracking Bug #

The table there shows a "Tracking Bug #", what is that supposed to mean?

  1. How long will this data be collected? Choose one of the following:

Firefox telemetry has an expiration version on every probe.

  1. What populations will you measure?

Have data stewards historically had troubles finding out about this? (Honest question, this might be a good thing to ask, I would just like to find out while I'm here).

  1. Please provide a general description of how you will analyze this data.

Why bother? How would my answer influence the decision? Almost all the data is public, anyone can do any kind of analysis with it after it gets recorded, right?

  1. Where do you intend to share the results of your analysis?

See 8., is this question necessary for public data?

Osmose commented 6 years ago

note that these complaints are specifically about adding telemetry to Firefox, this looks like it could be used for other things as well

I've used this form multiple times for things like Shield studies and non-Telemetry data collection, and have found the questions useful for planning the collection, and relevant to the collection I was performing.

Maybe a telemetry-probe-specific form could alleviate some of these concerns? Assuming the maintenance burden of multiple forms is fine for the data stewards.

chutten commented 6 years ago

The simple technical limitation that it can't be done using artifact builds

Not strictly related to the conversation, but you may be interested to follow bug 1425909 where we hope to soon address this specific concern.

+1 to Osmose's plan for a "If you're extending the expiry of an existing Telemetry Histogram or Scalar, use this form (3 questions). If you're adding a new Telemetry Histogram, use this form (4 questions). If you're..."

The form as it stands seems to satisfy three distinct roles:

That may contribute to some of the overlap from question to question.

rjweiss commented 6 years ago

The data stewards met last week. We all agreed that considering variants of requests and revising the process (similar to offering both exempt or non-exempt review in an IRB) is worth exploring. I've opened up #3 to track progress towards that objective.