rapidsurveys / odkr

Open Data Kit R API
http://rapidsurveys.io/odkr/
GNU General Public License v3.0
11 stars 4 forks source link

add function to displace GPS coordinates data #52

Closed ernestguevarra closed 3 years ago

ernestguevarra commented 3 years ago

feature request submitted via email on 17 February 2021 at 09:34 GMT:

Hi Ernest

I hope this email finds you well.

I am using your package and I really want to thank you.

I have a suggestion for a potential new function. Most of the odk survey take a gps location, stored as 1 variable (lat, long).

It is not easy to extract it to add a random noise to ensure the data protection but still use it for reporting.

Do you think that in the future you could add a function to add a random noise for each location of the survey?

Best regards Tristan

ernestguevarra commented 3 years ago

Response sent via email on 17 February 2021 at 22:34 GMT

Hi Tristan,

Thanks for your email.

Regarding your feature request, this is actually very easy to do in R as part of your data processing approach for your specific data and data requirements. Unfortunately, this feature, I believe, is beyond the scope of the odkr package given that its purpose is solely on providing an R API/interface to ODK-structured data. The fullest extent of data processing that is within the scope of odkr is data structuring to allow for common and standard statistical analysis to be applied to the dataset. Any direct manipulation and conversion of the data itself is very much the data owner and/or data analyst's role based on their own planned analytical approach and value judgements including aspects of data privacy.

There are standard and recognised approaches to de-identifying GPS data the most common of which is to randomly add a displacement value between 0 to 2 kms or 0 to 5 kms to sampling points/locations as is done in DHS surveys. See this link - https://dhsprogram.com/methodology/gps-data-collection.cfm - for a full description.

Again, this approach as specified in the DHS implementation is very straightforward to do in R if you are familiar with spatial analysis in R but this is something well beyond the scope of the odkr package. Also, this will only be relevant if you intend to make your data available to others and this is the only reason why the DHS applies displacement to their data. If you are unable to apply displacement to your GPS coordinates, please consider not sharing your data widely and/or removing GPS coordinates when sharing the data to others who do not have authority over the data. You are also personally responsible as per GDPR rules to ensure the protection of this data with identifying information such as GPS especially if you physically store this data/information in a physical location that is within the EU. I remember you telling me that you are not using any server and all that you are using is Google Sheets to store the data once you have retrieved it via ODK Briefcase. I think this is not a GDPR-compliant approach to data that has identifying information. So, you really have to think carefully about what you need the GPS data for and whether you have the capacity or the resources to store and protect this data. This is the advantage of having appropriate ODK servers for this kind of data collection and data types.

I hope this is helpful.

Best, Ernest