lookit / lookit-api

Codebase for Lookit v2 and Experimenter v2. Includes an API. Docs: http://lookit.readthedocs.io/
https://lookit.mit.edu/
MIT License
10 stars 18 forks source link

Inconsistency in labelling data downloads as identifiable #1424

Open becky-gilbert opened 2 months ago

becky-gilbert commented 2 months ago

Summary

On the 'All responses' page (Study -> Study Responses -> All Responses tab), downloading 'all response data' with the default checkbox options for 'participant data to include responses' results in a file that is labelled 'identifiable'.

On the 'Individual responses' page, downloading a single response with the same checkbox options results in a file that is NOT labelled as 'identifiable'.

AFAIK, these downloads contain the same types of data, it's just that the 'all responses' option is a concatenated list of each of the individual responses.

Description

According to the help text on the All Responses page:

Files with names, global IDs, birthdates, exact ages at participation, or "additional info" fields are marked as identifiable in the filename.

For both types of response downloads, the 'Child additional information' checkbox is selected by default. This is why the 'all responses' file is labelled as 'identifiable'. Out of all of the data options that are selected by default, removing "Child additional information" is the only one that results in the absence of the "identifiable" label.

Screenshot 2024-06-13 at 11 41 00 AM

For the individual response download, downloading the response with the default settings (i.e. including "Child additional information") does NOT result in the file being labelled "identifiable". In fact, that file is never tagged as being identifiable, even when all of the PII data options are included.

Screenshot 2024-06-13 at 11 41 38 AM

Questions

  1. Should the "child additional information" data be included by default, given that it is considered PII?
  2. Do we want to add an "identifiable" label to individual response downloads, so that the labelling system is consistent with "all response" downloads?