Implement US Privacy flag detection

stanleymarkman commented 3 years ago

We only have a few privacy flags so far, but there will inevitably be more, possibly with complex settings/meanings (beyond truthy/falsy, etc.) We should have an extendable format for these flags so adding new ones is easy without breaking our understanding of previous flags. Something like this (pseudocode, not meant to be actual info about GDPR) { flag = "gdpr", values : { "1": "meanings.protected" "0":"meanings.nonprotected" "":"meanings.unset" }, variants : { "GDPR" "Gdpr" "gdpr%32" }, jurisdiction="eu", legislation="General Data Protection Regulation" locations:{ "urlparam", "header" } importance=7 } (might not be a good idea, but as we get a very large number of flags prioritizing some might be a good idea, hence the numerical 'importance' value to show how binding/useful we consider a certain flag for our analysis) Of course, every flag has a slightly different meaning, but we need to generalize them into as small a number of categories as possible- maybe just like "protected data", "unprotected data".

stanleymarkman commented 3 years ago

This should closely interact with the data logging mechanism- for example, if while logging traffic we see a flag only expected in the url-params in the header instead, the data logging mechanism should notice that and pull it out of the stream for us to look at, and reevaluate the flag.

stanleymarkman commented 3 years ago

Right now, the privacy flags are just stored in a little list at the top of backgroundanalysis.js with some comments explaining what they do. The checkTruthy method will need to be adapted to work with the json format as well, and return the new "meaningfulness" datatypes, i.e. meanings.protected, meanings.unprotected, etc.