intenthq / anon

A UNIX Command To Anonymise Data
MIT License
354 stars 14 forks source link

Add support for salts within the `hash` action. #12

Closed nathankleyn closed 6 years ago

nathankleyn commented 6 years ago

Sometimes you want to make sure the hash action is irreversible and not vulnerable to rainbow table attacks. To support this, it would be useful if one was able to optionally turn on random salts being added to the hash (and perhaps this should be the default, for safety).

For example, given the following config and CSV, you'd expect to get the following output:

Config:

{
  "csv": {
    "delimiter": ","
  },
  "actions": [
    {
      // Salt is not given, so is random and on by default.
      "name": "hash"
    },
    {
      "name": "hash",
      // Have no salt.
      "salt": false
    },
    {
      "name": "hash",
      // Have a salt, but once which stays the same for all values.
      "salt": "somesalt"
    }
  ]
}

Input:

foo,bar,lux

Output:

d8b685c1a4b889369299f275d583e34f94831bb6,62cdb7020ff920e5aa642c3d4066950dd1f01f4d,98307a2daa4aa31a9e0b2deeeb98dad737970927

Where the first column is effectively random, the second column is a deterministic hash, and the third is deterministic but with the salt added as a suffix. That is:

sha1(foo<some random noise>,sha1(bar),sha1(luxsomesalt)
kittsville commented 6 years ago

I'm still concerned by the defaulting to a random salt in both the original issue spec and the PR. It's generally expected that something referred to as a hash would be deterministic. While the shape of the data would remain the same this wouldn't stop a casual observer being confused by the output changing each run.

There's two separate use cases being smushed together here:

Separating the two would also allow us to force a salt when hashing. Keeping the instructions/behaviour simpler.