influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.67k stars 5.59k forks source link

SNMP Trap input does not properly handle non-string data #9113

Closed loganmc10 closed 9 months ago

loganmc10 commented 3 years ago

System info:

1.18.1 running in docker

Steps to reproduce:

When using the snmp_trap input plugin, some devices send traps with non utf-8 encoded strings, for instance:

snmpTrapEnterprise.0="jnxProductNameEX4500" sysUpTimeInstance="3297275066" jnxCmRescueChgTime.0="3297275065" jnxCmRescueChgDate.0="�  &+" jnxCmRescueChgSource.0="2" jnxCmRescueChgUser.0="user" jnxCmRescueChgState.0="2" 

Expected behavior:

See https://github.com/influxdata/telegraf/issues/9113#issuecomment-829585208

Actual behavior:

See https://github.com/influxdata/telegraf/issues/9113#issuecomment-829585208

Additional info:

ivorybilled commented 3 years ago

Hi, thanks for the issue. To clarify my understanding, if a field is going to be given a value that contains invalid characters, is the request that the field is dropped in this instance? Or another form of sanitization?

loganmc10 commented 3 years ago

I guess sanitize would be good. Perhaps this is something that would fit best in a processor, giving the option to sanitize inputs via a processor before they are sent to the output plugin.

Go has a ToValidUTF8 function that allows you to replace any invalid characters with a replacement character (which could just be nothing). Some kind of a UTF8 sanitizer processer would be very helpful

ivorybilled commented 3 years ago

I agree that a processor would make a lot of sense, there is currently a strings processor that we likely could place this functionality into.

Not sure if it would work at all, but there is a starlark processor which allows you to write your own processor script, if there's some easy way for you to identify for when your data should be replaced or removed. I'm not sure if starlark supports non-utf-8 characters though.

loganmc10 commented 3 years ago

Ok I've submitted a PR to add an option to the strings processor to sanitize strings using Go's ToValidUTF8 function: https://github.com/influxdata/telegraf/pull/9118

robcowart commented 3 years ago

This issue is based on an incorrect assumption. Let's look at the OID in question jnxCmRescueChgDate. This OID is from JUNIPER-CFGMGMT-MIB and has syntax/type of DateAndTime:

jnxCmRescueChgDate OBJECT-TYPE
        SYNTAX     DateAndTime
        MAX-ACCESS read-only
        STATUS     current
        DESCRIPTION
                "The date and time when the rescue configuration was last
                changed."
        ::= { jnxCmRescueChg 2 }

DateAndTime is imported from SNMPv2-TC, where we find its definition as:

DateAndTime ::= TEXTUAL-CONVENTION
    DISPLAY-HINT "2d-1d-1d,1d:1d:1d.1d,1a1d:1d"
    STATUS       current
    DESCRIPTION
            "A date-time specification.

            field  octets  contents                  range
            -----  ------  --------                  -----
              1      1-2   year*                     0..65536
              2       3    month                     1..12
              3       4    day                       1..31
              4       5    hour                      0..23
              5       6    minutes                   0..59
              6       7    seconds                   0..60
                           (use 60 for leap-second)
              7       8    deci-seconds              0..9
              8       9    direction from UTC        '+' / '-'
              9      10    hours from UTC*           0..13
             10      11    minutes from UTC          0..59

            * Notes:
            - the value of year is in network-byte order
            - daylight saving time in New Zealand is +13

            For example, Tuesday May 26, 1992 at 1:30:15 PM EDT would be
            displayed as:

                             1992-5-26,13:30:15.0,-4:0

            Note that if only local time is known, then timezone
            information (fields 8-10) is not present."
    SYNTAX       OCTET STRING (SIZE (8 | 11))

You can see here that this field is not a "string" at all, rather an array of bytes that encodes a representation of a timestamp. For example the first two bytes, in network-byte order, are not two ASCII or UTF-8 characters, they are a 16-bit integer that represents the year.

This brings us to the incorrect assumption. While the syntax (or type) OCTET STRING has the word string in it, it is an array of bytes. Unlike what the name suggests, an OCTET STRING is not limited to string values but can store any byte based data type, including binary data. To somehow remove or replace values in an OCTET STRING because they are not printable would destroy the data itself, rendering it useless. The OID might as well be thrown away entirely.

The real issue here isn't figuring out how to print things that are non-printable. It is that Telegraf's SNMP Trap input doesn't properly handle OCTET STRINGs (and some other things) as they are intended to be handled. Deleting or replacing chunks of data that Telegraf can't print isn't going to fix that.

loganmc10 commented 3 years ago

@robcowart yes you are totally correct, I've updated the issue title and description (pointing to your explanation).

In my case, it's still useful for the time being to just sanitize the whole thing as utf-8, because the system I'm sending the data to (Grafana Loki) fails to parse anything if any part of any message in a search isn't utf-8. However, a proper solution would involve parsing this type of data into readable text.

Hipska commented 3 years ago

For the time being, You could opt to drop that specific field from the measurement, so you can use it with Grafana and Loki.

Hipska commented 3 years ago

Closing since linked PR got merged

robcowart commented 3 years ago

9118 doesn't actually fix this issue. It just makes it possible to pass along mishandled data.

Hipska commented 3 years ago

Okay, could you clarify what is still missing then?

robcowart commented 3 years ago

I already did that above... https://github.com/influxdata/telegraf/issues/9113#issuecomment-829585208

robcowart commented 3 years ago

If you are not going to handle the data properly, at least don't destroy it by replacing non-printable characters with junk. It would be better to transform the raw byte slice into a printable form using something like hex.EncodeToString() https://pkg.go.dev/encoding/hex#EncodeToString. This way you at least preserve the option for another application to properly transform the data at a later stage of processing.

Hipska commented 2 years ago

Where do those non-printable characters get replaced? I'm willing to add functionality to transform bytes to hex encoding, but It's difficult to find out when to do it and when not. I don't think it is always the case when SYNTAX is OCTET STRING?

adrianhardkor commented 2 years ago

any update on this? we still cannot decode SystemClock fields from snmp traps from standard routers like Cisco Juniper "systemClock": "\u0007\ufffd\u0003\u0015\u000b-*\u0001-\u0004\u0000",

olebowle commented 1 year ago

Hi, we were running into the same issue, in this case jnxCmCfgChgLatestDate (1.3.6.1.4.1.2636.3.18.1.3.0), which is also an SNMP DateAndTime object. Using snmpwalk we got:

$ snmpwalk -v2c -cpublic 10.0.0.1 1.3.6.1.4.1.2636.3.18.1.3.0
SNMPv2-SMI::enterprises.2636.3.18.1.3.0 = Hex-STRING: 07 E6 0A 04 10 2A 30 00 2D 04 00

We are using the json serializer which gave us:

\u0007\ufffd\n\u0004\u0010*0\u0000-\u0004\u0000

Comparing the two outputs, somehow the first two bytes 07 E6 are mangled by telegraf and replaced with \u0007\ufffd (maybe part of an internal data structure of telegraf?). This corresponds well to the observation of @adrianhardkor.

\u0007  ???
\ufffd  ???
\n  0A
\u0004  04
\u0010  10
*   2A
0   30
\u0000  00
-   2D
\u0004  04
\u0000  00

We now worked around the issue by base64 encoding the value using json_transformation, basically from:

json_transformation = '''
metrics{
  tags.agent_host: {
    "LastConfChange": fields.InfoLastConfChange,
  }
}
'''

to:

json_transformation = '''
metrics{
  tags.agent_host: {
    "LastConfChange": $base64encode(fields.InfoLastConfChange),
  }
}
'''

and base64 decoding the value in our application.

JuhaKS commented 10 months ago

Looking at this case from the point of view of the json_test.go in the repo, I observed the following

The JSON serialiser prepares to handle all incoming data through interface, which in the case of byte array data coming in first appears to assume it's interface {}([]uint8) format but after calling metric.New, the uint8 array gets translated into a interface {}(string). This step still retains all the original data, though it does get translated strangely in the case of 0x07 which is common in year 2000+ timestamps. The second half 0xe8 of the year number becomes\xe8 which still contains the original data

Later, the Golang json.Marshal is called to actually serialise that data, but as that code follows this principle

// String values encode as JSON strings coerced to valid UTF-8,
// replacing invalid bytes with the Unicode replacement rune.
// So that the JSON will be safe to embed inside HTML <script> tags,
// the string is encoded using HTMLEscape,
// which replaces "<", ">", "&", U+2028, and U+2029 are escaped
// to "\u003c","\u003e", "\u0026", "\u2028", and "\u2029".
// This replacement can be disabled when using an Encoder,
// by calling SetEscapeHTML(false).

the 0xe8 becomes Unicode replacement character 0xFFFD since 0xe8 is considered an invalid byte. This is the stage where data is lost

So unless these byte array timestamps are desired to be unpacked inside telegraf into their textual representation, the byte array content should be passed through as-is without attempting char conversion of the bytes, just dump the bytes with hex.EncodeToString() before calling the metric.New