terascope / teraslice

Scalable data processing pipelines in JavaScript
https://terascope.github.io/teraslice/
Apache License 2.0
50 stars 13 forks source link

jsonparse post_process #969

Closed kstaken closed 5 years ago

kstaken commented 5 years ago

If the output of an extraction is a stringified JSON document the jsonparse post_process would parse it and store the parsed object in target_field.

The extracted value provided as input might look something like this.

{ \"field\": \"value\" }

Then this rule

{ "follows": "some_id", "post_process": "jsonparse", "target_field": "myjson", "tag": "myjson_id" }

Would generate

{
    myjson: {
        field: "value"
    }
}

And then you could chain rules. (Note the usage of output:false)

{ "follows": "some_id", "post_process": "jsonparse", "target_field": "myjson", "tag": "myjson_id", "output": false }
{ "follows": "myjson_id", "post_process": "extract", "source_field": "myjson.field", "target_field": "myfield" }

which would have final output.

{
   myfield: "value"
}
kstaken commented 5 years ago

It's also critical that if there is an error in the parsing that it should not break the process on deployed jobs. The field in that case is should just be excluded but we'll need to think about surfacing error scenarios like that so that people building rules have some insight into what's going on while they're testing.