logstash-plugins / logstash-input-beats

Apache License 2.0
87 stars 81 forks source link

Incorrect de-/encoding of umlauts #318

Open Chimaine opened 6 years ago

Chimaine commented 6 years ago

This problem may extend to other UTF-8 characters as well.

We are sending metricbeats (6.2.3) to our logstash (6.2.3) beat input. Some fields in the message, like system.process.username can contain umlauts since the machine's locale is german.

If we configure metricbeat to output to a file, it correctly encodes those (e.g. "username":"NT-AUTORITÄT\\Netzwerkdienst").

However, if we output to a logstash beats input and output it to a file, encoding is wrong: "username":"NT-AUTORITÄT\\Netzwerkdienst". This is a problem for the GELF output that fails to send those messages.

Codec settings for beat input and file output are default. Changing them to another encoding like ASCII does not help, encoding is still wrong.

Minimum configuration that reproduces the problem

input { 
    beats {
        port => 5044
    }
}

output {
    file {
        path => "C:\logging-stack\logstash\logs\beats.log"
    }
}
TempyMcTempface commented 6 years ago

I have exactly the same problem. This is what I have tried in my Logstash configuration: ` input { beats { port => 5044

codec => plain {

charset => "ISO-8859-1"

}

host => ["localhost"]

}

} ` I don't have anymore logs with umlauts, so I can't tell if it actually works. My recommendation is to ask this question on https://discuss.elastic.co or https://webchat.freenode.net/#logstash .

Chimaine commented 6 years ago

Any news on this? This is quite a serious problem for us, since quite a few error messages get swallowed by this.

parora1701 commented 6 years ago

Hello Chimaine,

I had the same problem with umlauts and did few hit and trials with different encoding and finally it was solved with the following combination:

Added "encoding: ISO-8859-1" to my filebeat prospector and added "codec => plain {charset => "UTF-8" }" to my logstash beats input plugin or you can leave logstash as is because default is UTF-8 only.

Try with metricbeat console output first.

Let me know if you need anymore info about my usecase.

regards,

urso commented 6 years ago

Indeed, this is a file encoding problem, not Logstash or transfer. Transfer beats->LS uses UTF-8. In memory strings inside filebeat are UTF-8 as well. Without setting the encoding in filebeat, it assumes all input is UTF-8. Trying to fix the encoding in Logstash is too late, as filebeat already has had to deal with some invalid input.