mrbotcr / py3ClimMob

ClimMob is software for agricultural citizen science
https://climmob.net/
GNU Affero General Public License v3.0
5 stars 2 forks source link

Implement PII filtering #251

Open qlands opened 1 year ago

qlands commented 1 year ago

The underlying architecture of ClimMob (ODK Tools) has provisions for the anonymization and encryption of data. We need to implement PII in ClimMob so data export can include or exclude sensitive data like GPS coordinates.

kauedesousa commented 1 year ago

Part of this is also discussed here https://github.com/BioversityCostaRica/py3ClimMob/issues/124

marieALaporte commented 5 months ago

We discussed having new fields or column in the data to store this information; only those fields should be exported by default. The fields to be anonymized are:

kauedesousa commented 4 months ago

I think that is fine.

MarManrow commented 4 months ago

@BrandonMrBot This is the issue we talked about this morning.

MarManrow commented 1 month ago

@marieALaporte please take a look at these anonymization techniques. Please approve or suggest changes if necessary: https://docs.google.com/spreadsheets/d/1maS8dhmlVeWhuOShjwty4Jrk5fjBqEPP/edit?gid=1639127399#gid=1639127399

marieALaporte commented 1 month ago

I approve the file. Added my comments to the file Note that we expect that once a field is anonymized, it's anonymized value won't change when data are downloaded

MarManrow commented 1 month ago

@BrandonMrBot any additional questions regarding the implementation of this anonymization?

BrandonMrBot commented 1 month ago

@marieALaporte, @MarManrow

I have added an additional sheet called: Question types.

The idea is that each question type has one or more anonymization techniques available. With this, when users indicate that a question is sensitive, they can choose which anonymization technique they want to apply when downloading the data. Users can only set this parameter for their own questions.

For example, based on the exercise you did on question 199, it is a text-type question (which is stored as type-1 in ClimMob DB).

So, text-type questions can be anonymized using the techniques: Pseudonymization or Removal. From these two techniques, you need to select one, which will be the one configured for the question and shown when users download anonymized data.

I know you have worked on this table because these are the questions we need to configure, but we must think about the functionality at a general level for the platform. In this case, providing anonymization techniques for trial managers' own question configuration.

MarManrow commented 1 month ago

Thanks @BrandonMrBot. This is a good approach. I added some suggestions in orange. Indeed ,some fields may not need two techniques/options. @marieALaporte could you please provide feedback to the document ?