Tenzir has no mechanism to deal well with opaque strings that contain a lot of structure, such as URLs, domains, user agents, etc. We need functionality that transforms such strings into a record of values.
Elastic's logstash has a dissect filter for this purpose. (There's also grok, which is regex-based and slower, but can accommodate more input variation.) Here are some details on the dissect filter:
A new dissect pipeline operator takes a "dissect expression" to add a new (or replace the existing) record field. Here's an example using Elastic syntax:
%{name},%{addr1},%{addr2},%{addr3},%{city},%{zip}
Let this be the string value:
Jane Doe,4321 Fifth Avenue,,,New York,87432
Then the dissection should transform it into such a record:
### Definition of Done
- [x] Define the UX of the operator.
- [x] We have validated that this addressed our URL normalization use case.
- [x] Implement the `parse` operator
- [x] Implement the `kv` parser
- [x] Implement the `grok` parser
- [x] Implement the `time` parser
Tenzir has no mechanism to deal well with opaque strings that contain a lot of structure, such as URLs, domains, user agents, etc. We need functionality that transforms such strings into a record of values.
Elastic's logstash has a
dissect
filter for this purpose. (There's alsogrok
, which is regex-based and slower, but can accommodate more input variation.) Here are some details on thedissect
filter:A new
dissect
pipeline operator takes a "dissect expression" to add a new (or replace the existing) record field. Here's an example using Elastic syntax:Let this be the string value:
Then the dissection should transform it into such a record: