Closed JeniT closed 11 years ago
@statshero I don't think this is right: the option that talks about 'aggregated' is talking about summarising the data, with appropriate statistical disclosure controls, which is subtly different from anonymisation, which is more about removing personal details from individual-level data. I've made an attempt to add some clarification around the questions.
@JeniT, anonymisation includes techniques such as aggregation. I believe the focus here is on "can people be identified" and not the nature of the data processing. Indeed, the following question is
Has your anonymisation process been independently audited?
Aggregation does not include methods such as surpression, sampling or perturbation. Also with aggregated data there is a risk that people can be identified.
@statshero If you look at the change that I made (https://github.com/theodi/open-data-certificate/commit/d1aff1b214e30a9f1c319b211611ce8b424af5fd), you'll see that I changed the following question to Have your statistical disclosure controls been independently audited?. Does this work? If not, can you suggest an alternative? Does there need to be a distinction between the answers:
or do you think the two answers should be combined and every dataset that is about people or their activities require a PIA for Pilot level? The discussions we had previously indicated that wasn't required.
@JeniT Perhaps it is a discussion whether you want to distinguish between statistical disclosure control (SDC) and anonymisation. However, I propose not to do so because they are achieving the same. See also the (shortened) ICO definitions of
Disclosure Control: A technique used to control the risk of individuals being identified from statistical data Anonymisation: The process of rendering data into a form which does not identify individuals
The difference between the two answers, in my understanding after the privacy workshop, is qualitative. We assume that the person filling out the questionnaire is not an expert on anonymisation. "no" gives the user the option to be confident in their anonymisation process (e.g. through aggregation). "yes" exists if
I would not suggest a PIA for pilot level. Keeping the three options with the change in the wording ("anonymised" instead of "aggregated") and more clarification should achieve this.
My concern is that if people have an option that is no, the data has been anonymised so individuals can't be identified then they will think that they can select this option if they have attempted anonymisation, whether it's any good or not (eg if they've just removed names & addresses). The point is that aggregated/summarised data has less risk of disclosure than individual-level data.
@JeniT you raise a valid concern. Thus, I suggest the following:
(I know you aware that less is not zero. A conservative expert would have to choose "yes" if the data is derived from individuals because virtually all aggregation carries a non-zero risk of re-identification.)
Done, thanks.
based on feedback during training delivered by @statshero