logstash-plugins / logstash-output-elasticsearch

https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html
Apache License 2.0
216 stars 306 forks source link

When event JSON data contains non UTF-8 invalid bytes, replace with replacement characters. #1169

Closed mashhurs closed 4 months ago

mashhurs commented 4 months ago

Description

Current buggy behaviours:

This PR introduces an immediate fix and opens a discussion for long general term use case. Tested with apache client trace logs that sending bytes do not change.

# use following config in config/encoding_test.conf fiel
input { generator { count => 1 } }
filter { ruby { code => 'str = "\xAC"; event.set("message", str)' } }
output {
 elasticsearch {
   cloud_id => "cloud_id"
   cloud_auth => "elastic:{pwd}"
   http_compression => "${HTTP_COMPRESSION}"
 }
 stdout { }
}

# BEFORE the fix: see https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/1168 logs
# AFTER the fix, with any HTTP compression level (0~9), the behavior will be same that data will be indexed
[2024-03-18T09:06:31,891][DEBUG][org.apache.http.impl.conn.PoolingHttpClientConnectionManager][main][999000c22ac1744372923039d3bee405a92df01b3dafcd64f0830a24ad60acc6] Connection released: [id: 0][route: {s}->https://68f825b174ea43b5986baabce7534163.ca-central-1.aws.elastic-cloud.com:443][total available: 1; route allocated: 1 of 100; total allocated: 1 of 1000]
{
          "host" => {
        "name" => "localhost"
    },
      "@version" => "1",
    "@timestamp" => 2024-03-18T16:06:31.686611Z,
         "event" => {
        "sequence" => 0,
        "original" => "Hello world!"
    },
       "message" => "\xAC"
}
jsvd commented 4 months ago

For future reference, the manticore differentiated behaviour we see is this:

> string_entity = Java::org.apache.http.entity.StringEntity.new("\xAC")
=> #<Java::OrgApacheHttpEntity::StringEntity:0x2f930be7>
> Java::org.apache.http.util.EntityUtils.toString(string_entity)
=> "?"
> byte_array_entity = Java::org.apache.http.entity.ByteArrayEntity.new("\xAC".to_java_bytes)
=> #<Java::OrgApacheHttpEntity::ByteArrayEntity:0x27c243a3>
> Java::org.apache.http.util.EntityUtils.toString(byte_array_entity)
=> "¬"
yaauie commented 4 months ago

Travis is failing in the 7.x integration tests because the logs for the job are too verbose, and travis terminates the build when the logs go over the job's maximum length:

The job exceeded the maximum log length, and has been terminated.

-- Job Output

I have run both such dockerized jobs locally and they are green (successful).

mashhurs commented 4 months ago

Following CI jobs failed but I do confirm they are passing on my local, should be related to travis. So, they are not blockers.

2733.3 | INTEGRATION=true ELASTIC_STACK_VERSION=7.x | Linux | errored 2733.4 | INTEGRATION=true ELASTIC_STACK_VERSION=7.x SNAPSHOT=true LOG_LEVEL=info | Linux | errored