humio / issues

Issue Tracker for Humio
4 stars 2 forks source link

Case Sensitivity: Ingest or Search Options #65

Closed jdesantis closed 5 years ago

jdesantis commented 5 years ago

Currently, searching using normal syntax can be tedious and inaccurate due to case sensitivity. Cyber analysts should not need to waste time worrying about the case of fields nor field values under most normal circumstances and trying to normalize the data from the source is not always feasible. Here are some ways the issue could be resolved:

  1. Create either a Humio option or a Humio tag response to ingest a source/repo as only lowercase. Characters.

  2. Add an option allowing users to disable case sensitivity on searches by default, without needing to add additional commands to each query. This option would need to be available to field names and field content, and be selectable.

Best Regards,

Joe DeSantis Security Engineer Novetta

henrikjohansen commented 5 years ago

This would make a lot of sense especially for Windows Eventlogs since the majority of logshippers don't support this client side.

mortengrouleff commented 5 years ago

Proposed solution: Allowing the lowercase() function in Humio to lowercase all field and values without listing the specific fields, and then the users should use that in their parser for performance reasons. That part is easy and efficient.

Allowing it in the query flow would require a separate implementation as rewriting all events as part of the query flow is too expensive to be useful. It would need to work like a flag that turns on case-insensitivity to all field lookups - not that easy within the requirement to not hurt performance of other cases. First version could disallow the function in query context, allowing it only in parsing context.

henrikjohansen commented 5 years ago

First version could disallow the function in query context, allowing it only in parsing context.

@mortengrouleff ☝️would break a lot of existing queries, at least for us since we use lowercase() extensively in saved queries, dashboard queries, etc

mortengrouleff commented 5 years ago

First version could disallow the function in query context, allowing it only in parsing context.

@mortengrouleff would break a lot of existing queries, at least for us since we use lowercase() extensively in saved queries, dashboard queries, etc

I'm aware of that. The suggestion was that the option to lowercase all fields and values witout listing the fields, which is something the current version of the functions does not do, could be limited to parser context.

mortengrouleff commented 5 years ago

Implemented as follows, expected to be in release 1.2.5.

Existing lowercase function now accepts "*" as the name of the field to work on. When also setting "include=both" the effect is to lowercase all field names and all values, removing the old fields in the process. This is intended fro use in the ingest parser pipeline, when receiving data with mixed case and not wanting to preserve the original field names.

Existing lowercase function is noted as deprecated in query context, suggesting the use of the new "lower" function instead, which only acts on the field values, not field names, and has the normal semantics on parameters, allowing it to be used inside eval and other functions and use the "foo:=lower(bar)" style.

Also added as "upper" function for symmetry.

jdesantis commented 5 years ago

Awesome. Thanks guys!