snowplow / enrich

Snowplow Enrichment jobs and library
https://snowplowanalytics.com
Other
21 stars 38 forks source link

Common: encrypt original values in PII Enrichment #33

Open chuwy opened 4 years ago

chuwy commented 4 years ago

The motivation for this ticket is to help users of piinguin and piinguin relay to better secure access to the original data on piinguin without having to focus on securing access to piinguin within an organisation.

The way to achieve that is to have one (or more) public keys with which all the original values will be encrypted. The new configuration will look like this:

{
  "schema": "iglu:com.snowplowanalytics.snowplow.enrichments/pii_enrichment_config/jsonschema/3-0-0",
  "data": {
    "vendor": "com.snowplowanalytics.snowplow.enrichments",
    "name": "pii_enrichment_config",
    "emitEvent": true,
    "enabled": true,
    "parameters": {
      "pii": [
        {
          "pojo": {
            "field": "user_id",
            "encryptionKeyName": "other-key"
          }
        },
        {
          "pojo": {
            "field": "user_fingerprint"
            # No encryption
          }
        },
        {
          "json": {
            "field": "unstruct_event",
            "schemaCriterion": "iglu:com.mailchimp/subscribe/jsonschema/1-*-*",
            "jsonPath": "$.data.['email', 'ip_opt']",
            "encryptionKeyName": "email-key"
          }
        }
      ],
      "strategy": {
        "pseudonymize": {
          "hashFunction": "SHA-1",
          "salt": "pepper123"
        }
      },
      "encryption": [
        {
          "keyName": "email-key",
          "key": "some-rsa-publickey"
        },
        {
          "keyName": "other-key",
          "key": "some-rsa-publickey-2"
        }
      ]
    }
  }
}

The emitted event will also be changed (value is encrypted and base64 encoded, the actual implementation will need to be finalised):


{
  "schema": "iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0",
  "data": {
    "schema": "iglu:com.snowplowanalytics.snowplow/pii_transformation/jsonschema/2-0-0",
    "data": {
      "pii": {
        "pojo": [
          {
            "fieldName": "user_fingerprint",
            "originalValue": "its_you_again!",
            "modifiedValue": "27abac60dff12792c6088b8d00ce7f25c86b396b8c3740480cd18e21068ecff4"
          },
          {
            "fieldName": "user_ipaddress",
            "originalValue": "eZDx1Y1SMIcP0vIzkNsx3xMZ4twdyqqU5bqNPkLNYElDNcUhD/8NH0Xb8vYPLvy5NZmm5XuMzInQ7xRHr4kB9q4kvRwtCwUGSS4OSR/QlPQWMz6NzMAep7oQ10crpdxQcXH5LxvMTMROndxOnV5Aglepd4zuSMRj+q3u9uH6zZmiMjS/1xcxC4dRdD3NtrR9IpNjaqkx9BrQ2S1ClsVntU/UGLZEAle5H+Uy+qvXYczbQsmVVwYLdgv4S4Om0QPW+T48pu2VGXVwNnJUwdAFqL+snAFrOfyGa1oDcwoTGcbhR3YJO2Gv7NzvMyDtPaNLaYgrzDJcDV1qLt1W12h2Bg==",
            "modifiedValue": "dd9720903c89ae891ed5c74bb7a9f2f90f6487927ac99afe73b096ad0287f3f5",
            "encryptionKeyName": "other-key"
          },
          {
            "fieldName": "user_id",
            "originalValue": "eZDx1Y1SMIcP0vIzkNsx3xMZ4twdyqqU5bqNPkLNYElDNcUhD/8NH0Xb8vYPLvy5NZmm5XuMzInQ7xRHr4kB9q4kvRwtCwUGSS4OSR/QlPQWMz6NzMAep7oQ10crpdxQcXH5LxvMTMROndxOnV5Aglepd4zuSMRj+q3u9uH6zZmiMjS/1xcxC4dRdD3NtrR9IpNjaqkx9BrQ2S1ClsVntU/UGLZEAle5H+Uy+qvXYczbQsmVVwYLdgv4S4Om0QPW+T48pu2VGXVwNnJUwdAFqL+snAFrOfyGa1oDcwoTGcbhR3YJO2Gv7NzvMyDtPaNLaYgrzDJcDV1qLt1W12h2Bg==",
            "modifiedValue": "7d8a4beae5bc9d314600667d2f410918f9af265017a6ade99f60a9c8f3aac6e9",
            "encryptionKeyName": "other-key"
          }
        ],
        "json": [
          {
            "fieldName": "unstruct_event",
            "originalValue": "eZDx1Y1SMIcP0vIzkNsx3xMZ4twdyqqU5bqNPkLNYElDNcUhD/8NH0Xb8vYPLvy5NZmm5XuMzInQ7xRHr4kB9q4kvRwtCwUGSS4OSR/QlPQWMz6NzMAep7oQ10crpdxQcXH5LxvMTMROndxOnV5Aglepd4zuSMRj+q3u9uH6zZmiMjS/1xcxC4dRdD3NtrR9IpNjaqkx9BrQ2S1ClsVntU/UGLZEAle5H+Uy+qvXYczbQsmVVwYLdgv4S4Om0QPW+T48pu2VGXVwNnJUwdAFqL+snAFrOfyGa1oDcwoTGcbhR3YJO2Gv7NzvMyDtPaNLaYgrzDJcDV1qLt1W12h2Bg==",
            "modifiedValue": "269c433d0cc00395e3bc5fe7f06c5ad822096a38bec2d8a005367b52c0dfb428",
            "jsonPath": "$.ip",
            "schema": "iglu:com.mailgun/message_clicked/jsonschema/1-0-0",
            "encryptionKeyName": "email-key"
          },
          {
            "fieldName": "contexts",
            "originalValue": "eZDx1Y1SMIcP0vIzkNsx3xMZ4twdyqqU5bqNPkLNYElDNcUhD/8NH0Xb8vYPLvy5NZmm5XuMzInQ7xRHr4kB9q4kvRwtCwUGSS4OSR/QlPQWMz6NzMAep7oQ10crpdxQcXH5LxvMTMROndxOnV5Aglepd4zuSMRj+q3u9uH6zZmiMjS/1xcxC4dRdD3NtrR9IpNjaqkx9BrQ2S1ClsVntU/UGLZEAle5H+Uy+qvXYczbQsmVVwYLdgv4S4Om0QPW+T48pu2VGXVwNnJUwdAFqL+snAFrOfyGa1oDcwoTGcbhR3YJO2Gv7NzvMyDtPaNLaYgrzDJcDV1qLt1W12h2Bg==",
            "modifiedValue": "1c6660411341411d5431669699149283d10e070224be4339d52bbc4b007e78c5",
            "jsonPath": "$.data.emailAddress2",
            "schema": "iglu:com.acme/email_sent/jsonschema/1-1-0",
            "encryptionKeyName": "email-key"
          },
          {
            "fieldName": "contexts",
            "originalValue": "eZDx1Y1SMIcP0vIzkNsx3xMZ4twdyqqU5bqNPkLNYElDNcUhD/8NH0Xb8vYPLvy5NZmm5XuMzInQ7xRHr4kB9q4kvRwtCwUGSS4OSR/QlPQWMz6NzMAep7oQ10crpdxQcXH5LxvMTMROndxOnV5Aglepd4zuSMRj+q3u9uH6zZmiMjS/1xcxC4dRdD3NtrR9IpNjaqkx9BrQ2S1ClsVntU/UGLZEAle5H+Uy+qvXYczbQsmVVwYLdgv4S4Om0QPW+T48pu2VGXVwNnJUwdAFqL+snAFrOfyGa1oDcwoTGcbhR3YJO2Gv7NzvMyDtPaNLaYgrzDJcDV1qLt1W12h2Bg==",
            "modifiedValue": "72f323d5359eabefc69836369e4cabc6257c43ab6419b05dfb2211d0e44284c6",
            "jsonPath": "$.emailAddress",
            "schema": "iglu:com.acme/email_sent/jsonschema/1-0-0",
            "encryptionKeyName": "email-key"
          }
        ]
      },
      "strategy": {
        "pseudonymize": {
          "hashFunction": "SHA-256"
        }
      }
    }
  }
}

An incidental benefit coming out of this is that the values in kinesis pii are also encrypted.

chuwy commented 4 years ago

Migrated from https://github.com/snowplow/snowplow/issues/3788 (comments are auto-generated)