Open jordansissel opened 9 years ago
I'm hitting this bug as well and was trying to think about the "right" way to fix it. The problem is that some of the SNMP values are going to contain arbitrary binary strings. As far as I can tell, there's not a way to convert arbitrary binary strings to UTF-8 in a reversible way.
I can see where @jordansissel is saying that adding codec support would help with the encoding, but I'm not convinced it would help for arbitrary binary strings as it would not guarantee that you can still recover the same binary on the other side of Redis, for example.
So I'm thinking the best solution is to change the behavior of the snmptrap
input so that binary fields are converted to hex strings or base64 strings or something (e.g. using String#unpack
and Array#pack
), so that they can be manipulated more easily without causing problems.
Obviously, this would break compatibility with anyone using the binary strings directly, but would alleviate the problem where people expect it to 'just work' and it currently doesn't in some cases.
Does this seem like a terrible idea? (mentioning @nvx and @simmel from the old thread to bring them back into the discussion here)
On Wed, 2016-03-16 at 13:55:28 -0700, Greg Mefford wrote:
I'm hitting this bug as well and was trying to think about the "right" way to fix it. The problem is that some of the SNMP values are going to contain arbitrary binary strings. As far as I can tell, there's not a way to convert arbitrary binary strings to UTF-8 in a reversible way.
Possible to store them in two fields? One UTF-8 'replace' encoded and one binary?
I can see where @jordansissel is saying that adding codec support would help with the encoding, but I'm not convinced it would help for arbitrary binary strings as it would not guarantee that you can still recover the same binary on the other side of Redis, for example.
IIRC Redis can handle binaries? I'm not sure though.
So I'm thinking the best solution is to change the behavior of the
snmptrap
input so that binary fields are converted to hex strings or base64 strings or something (e.g. usingString#unpack
andArray#pack
), so that they can be manipulated more easily without causing problems.
How would you manipulate that from logstash without having to retort to
the ruby
filter to unpack and convert them?
I guess the other question is how to properly represent this data once it's in ElasticSearch anyway? Perhaps the real solution is to convert early (like in the input plugin) from binary MACs to say HEX or whatnot, possibly as a configurable option?
Yeah, I think the only generic solution is to convert early in the input plugin, because the MIB only specifies that it's a binary field, with prose to describe how to decode the bytes in the binary. It's not just about MAC addresses and it will vary from one trap to the next, so I don't think an automatic solution is possible.
As far as how to process it downstream, it seems like a ruby filter is the only option today, but I can imagine a binary-manipulation filter similar to mutate as a possibility.
On Thursday, March 17, 2016, NV notifications@github.com wrote:
I guess the other question is how to properly represent this data once it's in ElasticSearch anyway? Perhaps the real solution is to convert early (like in the input plugin) from binary MACs to say HEX or whatnot, possibly as a configurable option?
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/logstash-plugins/logstash-input-snmptrap/issues/8#issuecomment-197760998
(This issue was originally filed by @nvx at https://github.com/elastic/logstash/issues/1636)
I have a simple logstash configuration that reads snmptraps and outputs them to redis which is json encoded. Some messages fail to make it to redis with the logstash log reporting "Failed to convert event to JSON. Invalid UTF-8, maybe?". Looking at the code this appears to originate from within the redis output.
I have tried without specifying a codec, as well as explicitly setting the charset to BINARY. The SNMP traps do contain some non-ASCII characters (binary representations of MAC addresses and IP addresses) but they appear to be properly escaped with \xHH style notation in the output log. The only difference I can spot between messages that make it to redis, compared ones that fail is the MAC address field. An example of a failing message has this field in the message part (Note it appears to be doubly escaped as this is from the error log which itself appears to be json encoded as well):
And the same value again as parsed by the MIBs:
Other MAC addresses that start with FC:F8:AE work, so I can only assume it is the latter half (3C:2E:18) that is breaking the encoding.