humio / issues

Issue Tracker for Humio
4 stars 2 forks source link

Feature request - Support hash generation / hash lookups #93

Closed henrikjohansen closed 5 years ago

henrikjohansen commented 5 years ago

I would love to have the ability to generate SHA256 hashes by combining the content of a field plus a static salt (the salt should be kept as secret as possible and thus should be pulled from Humios config).

This would enable us to replace all SSN numbers or other sensitive / GDPR related material (names, emails, ip adresses, etc) with hashes and thus avoid storing them in plain text while keeping the ability to search for a specific one. Many analysis would still work using this methodolody, like groupby(), even without hashing a string before searching.

In parsers one could generate a hash and either overwrite the source field, or generate a new field and drop the original one.

A query could look like this :

ssn = hash("1234561234", salt="ssn")

Discussed with Morten Grouleff.

mortengrouleff commented 5 years ago

One possible syntax is to have one function able to either produce the hashes (for the parser step) or match (for the search step) depending on the parameters supplied.

To replace the ssn field with the hashed value of the same field in a parser:

sha(field=ssn, salt="salt1", as="ssn")
// The same but using the shorthand for as:
ssn := sha(field=ssn, salt="salt1")

To match the hash of the text "12345678" against the value in the ssn in the event in a search:

sha(field="ssn", input="12345678", salt="salt1") 
// The same but using the shorthand for field and unnamed field:
ssn =~ sha("12345678", salt="salt1")
// The same but using the shorthand for the unnamed field and searching @rawstring:
... | sha("12345678", salt="salt1") | ...

input and as are mutually exclusive:

The salt names a system wide salt string that is included in the hashes to make hashes harder to brute force. The salt is some random string kept secret by the system, selected by the salt parameter.

Other options: algorithm to select sha256, sha512 or others. encoding/format to select hex, base64, or others, or to select to output only the leading k bits.

henrikjohansen commented 5 years ago

:point_up: is excellent. One small proposal would be to allow for salts to either be specified by the operator or randomly generated by Humio.

This would allow for easier integration with other tools and also allow for generation of hashes in other parts of the ingest pipeline or perhaps even directly on the source systems.

Likewise, auto generated salts should be in global for backup purposes.

mortengrouleff commented 5 years ago

I plan to make salts get auto-generated for the first version of this: Humio will make up a 256 bit secure random bitstring and store it in global when you refer to an unknown salt name in a parser context.

Later we can make it possible to store a specific bitstring into global as a salt for use when hashing happens in the source systems. But along with that comes the need to be able to format in whatever format the external hashers might use, where the first version only needs to know the format Humio uses by default.

henrikjohansen commented 5 years ago

Sounds great ... looking forward to 1.6.2 :innocent:

mortengrouleff commented 5 years ago

:rofl:

mortengrouleff commented 5 years ago

It's part of upcoming 1.6.2 using the hashRewrite and hashMatch functions.