Closed siehar closed 2 years ago
The JSON you provided does not have a JSON object in it. A JSON object would be something like an array inside the JSON itself. I would suggest going to our community page or slack and provide a sample JSON with what you are trying to do and then see how your config can be changed to better parse your JSON.
I am going to close this as this isn't a bug in Telegraf.
@powersj
The JSON you provided does not have a JSON object in it. A JSON object would be something like an array inside the JSON itself.
{"time": "2022-01-01T21:57:10", "test": 123}
is a JSON object. From the JSON docs (https://www.json.org/json-en.html):
An object is an unordered set of name/value pairs. An object begins with {left brace and ends with }right brace. Each name is followed by :colon and the name/value pairs are separated by ,comma.
The given JSON can be parsed by json_v2
if it is used as the data_format
in the input
block itself.
That's what I wanted to say with this line:
When I use data_format = "json_v2" in the input.file section and place the same config there, data is parsed correctly, so I think the json_v2 config should be OK
I probably should have worded it more clearly so I'll catch up on that here:
If you use the configuration below, the metric is parsed as expected and printed to stdout.
Note that the json_v2
config is exactly the same as in the issue above, just that it now is in the input.file
block and there is no processors.parser
block.
[agent]
interval = "1s"
flush_interval = "1s"
[[inputs.file]]
files = ["./foo.json"]
data_format = "json_v2"
[[inputs.file.json_v2]]
[[inputs.file.json_v2.object]]
path = "@this"
timestamp_key = "time"
timestamp_format = "2006-01-02T15:04:05"
timestamp_timezone = "Local"
disable_prepend_keys = true
[[inputs.file.json_v2.object.field]]
path = "test"
type = "uint"
[[outputs.file]]
files = ["stdout"]
data_format = "influx"
Output:
2022-01-04T21:25:04Z I! Starting Telegraf 1.21.1
2022-01-04T21:25:04Z I! Loaded inputs: file
2022-01-04T21:25:04Z I! Loaded aggregators:
2022-01-04T21:25:04Z I! Loaded processors:
2022-01-04T21:25:04Z I! Loaded outputs: file
2022-01-04T21:25:04Z I! Tags enabled: host=bar
2022-01-04T21:25:04Z I! [agent] Config: Interval:1s, Quiet:false, Hostname:"bar", Flush Interval:1s
file,host=bar test=123i 1641070630000000000
file,host=bar test=123i 1641070630000000000
^C2022-01-04T21:25:06Z I! [agent] Hang on, flushing any cached metrics before shutdown
2022-01-04T21:25:06Z I! [agent] Stopping running outputs
Also, with the setup described in the issue above, Telegraf never even comes to parsing the file.
As the pasted error message suggests: Error loading config file telegraf.conf
means it already errors when parsing the config file before it even starts any processing.
I would suggest going to our community page or slack and provide a sample JSON with what you are trying to do and then see how your config can be changed to better parse your JSON.
I already have a workaround for my use case that involves the starlark processor.
I still think this is either a bug or there is some documentation missing about how to correctly configure processors.parser
with data formats that are more complex to configure than the ones currently described in the Readme.
{"time": "2022-01-01T21:57:10", "test": 123} is a JSON object. From the JSON docs (https://www.json.org/json-en.html):
Sorry, you are right that I was not clear enough. If you are working on the root JSON object received by Telegraf, there is no need to specify an object using path = "@this"
. The parser knows what the root object is and instead you can call directly out to the field you need:
[[inputs.file]]
files = ["./foo.json"]
data_format = "json_v2"
[[inputs.file.json_v2]]
timestamp_key = "time"
timestamp_format = "2006-01-02T15:04:05"
timestamp_timezone = "Local"
disable_prepend_keys = true
[[inputs.file.json_v2.field]]
path = "test"
type = "uint"
Which produces:
./telegraf --config config.toml --test
2022-01-04T22:28:55Z I! Starting Telegraf 1.22.0-d8cc3551
2022-01-04T22:28:55Z I! Loaded inputs: file
2022-01-04T22:28:55Z I! Loaded aggregators:
2022-01-04T22:28:55Z I! Loaded processors:
2022-01-04T22:28:55Z W! Outputs are not used in testing mode!
2022-01-04T22:28:55Z I! Tags enabled: host=ryzen
> file,host=ryzen test=123i 1641335335000000000
Hopefully, the simpler more straightforward configuration makes much more sense now. Applying a parser directly to an input as I have shown is the advertised and preferred method of using the parser. As alluded to in my original post, needing to use the json_v2's object call is more useful when there is an embedded array inside the root object you pass.
Thanks for the tip with @this
.
But the main point for this issue still holds. Namely that processors.parser
doesn't work with data_format = "json_v2"
.
My use case is a bit more complex than the dumbed-down example I constructed for this issue for the sake of reproducibility.
I have an MQTT consumer input that's subscribed to multiple topics where data comes in with different JSON formats so I can't configure the parser in the input directly.
I'm aware of #10072 which is similar to my use case. But mangling different JSON formats into one json_v2
config is IMO a recipe for disaster (name clashes between different JSON formats are one problem). Also the referenced issue is still open so it currently wouldn't work anyway.
My idea was to use the MQTT input's new topic-parsing feature together with namepass/tagpass to route incoming data to the correct processors.parser
instance. That didn't work because of the issue described above.
So now I have two options:
data_format=value
and a starlark processor that does the actual JSON processing and transformation into metrics (that's what I'll probably do but I would prefer to use processors.parser
)On a side note: If I try to use your simplified config in processors.parser
it still doesn't work but the error message is different.
Maybe this helps if someone stumbles across this later on:
[agent]
interval = "1s"
flush_interval = "1s"
[[inputs.file]]
files = ["./foo.json"]
data_format = "value"
data_type = "string"
[[processors.parser]]
parse_fields = ["value"]
data_format = "json_v2"
[[processors.parser.json_v2]]
timestamp_key = "time"
timestamp_format = "2006-01-02T15:04:05"
timestamp_timezone = "Local"
disable_prepend_keys = true
[[processors.parser.json_v2.field]]
path = "test"
type = "uint"
[[outputs.file]]
files = ["stdout"]
data_format = "influx"
2022-01-04T23:10:07Z I! Starting Telegraf 1.21.1
2022-01-04T23:10:07Z E! [telegraf] Error running agent: Error loading config file telegraf_simplified.conf: plugin processors.parser: line 12: configuration specified the fields ["field" "timestamp_key" "disable_prepend_keys"], but they weren't used
Interestingly, timestamp_timezone
and timestamp_format
are not listed in the fields that weren't used
.
I still think it's a bug but I don't see that there is any interest in finding the core issue so I won't waste any more time with this discussion. I'd just suggest to reopen this issue to at least document that this is an open problem.
Reopened as it seems to be a configuration parsing issue rather than actual running the plugin.
Have a similar problem. With a "basic" configuration, the parser ignores the field to parse while adding additional configuration I get the same error of the posts above:
2022-08-31T07:55:16Z E! [telegraf] Error running agent: Error loading config file /etc/telegraf/telegraf.conf: plugin processors.parser: line 9: configuration specified the fields ["field"], but they weren't used
The input is fetched from docker by the docker_log
plugin.
[agent]
interval = "10s"
[[inputs.docker_log]]
endpoint = "unix:///var/run/docker.sock"
timeout = "5s"
container_name_include = ["test"]
[[processors.parser]]
parse_fields = ["message"]
drop_original = false
merge = "override"
data_format = "json_v2"
[[outputs.file]]
files = ["stdout"]
Latest version of telegraf: 1.23.4
docker run --user telegraf:$(stat -c '%g' /var/run/docker.sock) -v $PWD/telegraf.conf:/etc/telegraf/telegraf.conf:ro -v //var/run/docker.sock://var/run/docker.sock --rm telegraf
with config from aboveTelegraf should parse the field. For example, given the following input:
docker_log,container_image=test, ... ,message="{\"old_random\":12720,\"metric\":false,\"new_random\":7493}" 1661522426070238817
The output is
docker_log,container_image=test, ... ,message="{\"old_random\":12720,\"metric\":false,\"new_random\":7493}"
Instead, with data_format=json
the output is correct
docker_log, ... ",mesage="{\"old_random\":12720,\"metric\":false,\"new_random\":7493}",old_random=12720,new_random=7493
With additional configuration to json_v2
[[processors.parser]]
parse_fields = ["message"]
drop_original = false
merge = "override"
data_format = "json_v2"
[[processors.parser.json_v2]]
[[processors.parser.json_v2.field]]
path = "new_random"
The output is
2022-08-31T07:55:16Z I! Using config file: /etc/telegraf/telegraf.conf
2022-08-31T07:55:16Z E! [telegraf] Error running agent: Error loading config file /etc/telegraf/telegraf.conf: plugin processors.parser: line 9: configuration specified the fields ["field"], but they weren't used
Same error adding optional = true
This issue might have been fixed in v1.24.0.
Probably as a side effect of https://github.com/influxdata/telegraf/pull/11343 but I can't really be sure about that.
At least my original config now works with 1.24.0 but didn't work in 1.23.4.
@Oghma: You may want to try your config with the new version too.
Great to hear.
@siehar I confirm v1.24.0 fix the issue. Thank you
Relevent telegraf.conf
Logs from Telegraf
System info
Telegraf 1.21.1 (git: HEAD 7c9a9c17) (both on Debian 11 and Ubuntu 20.04, amd64); installed via Debian package downloaded from Github Releases page
Docker
not applicable
Steps to reproduce
foo.json
into working directory with content e.g.{"time": "2022-01-01T21:57:10", "test": 123}
telegraf --config telegraf.conf
with the config from aboveExpected behavior
Telegraf should print the one metric from the JSON file to stdout in influx format every second.
Actual behavior
Telegraf exits with an error because of invalid config. It seems it doesn't recognize that the
object
section must be handed to the json_v2 parser or something.Additional info
data_format = "json_v2"
in theinput.file
section and place the same config there, data is parsed correctly, so I think the json_v2 config should be OKdata_format
in theprocessors.parser
section withinflux
, remove the additionaljson_v2
config and place an influx-formatted metric into the input file, then it also worksprocessors.parser
. I didn't find any example. If this is a user error on my part it would at least be good to add a correct example to theprocessors.parser
Readme