jamescowens / Gridcoin-Research

Gridcoin-Research
MIT License
2 stars 0 forks source link

Scraper/new NN audit for GDPR compliance #15

Closed jamescowens closed 3 years ago

jamescowens commented 5 years ago

The new scraper/NN is designed to provide the maximum possible compliance with GDPR requirements, within the limits imposed by blockchain technology. In particular here are some salient points:

  1. Once the user stats are downloaded, they are immediately filtered and only the minimum fields required for stats computation are retained. The original files are deleted. In particular the account name is eliminated and only the CPID's and pure stats are retained.
  2. For the scraper nodes, the system provides for a defined retention period for statistics on disk, nominally set for 48 hours. Files aging beyond the retention period are deleted automatically.
  3. For non-scraper nodes not directly downloading stats, the current stats retention is in memory only, with no on-disk storage, and the in-memory retention period is also nominally set for 48 hours.
  4. The superblock production is similar to today, with only CPID's and magnitudes recorded in the superblock.

The statistics data indexed by CPID, once the account names are filtered out and discarded, becomes pseudo-anonymized data for the purposes of the GDPR. For stats not preserved in a superblock, this pseudoanonymized data is deleted within 48 hours as stated above. For stats recorded in a superblock, we have the same challenge we have today regarding the immutability of that information once it is in the blockchain.

Perhaps we should consider requiring an acknowledgment of the inability to delete SB statistics data as part of the process of advertising a beacon, which is the starting point for Gridcoin statistics collection.