logstash-plugins / logstash-output-lumberjack

Apache License 2.0
6 stars 25 forks source link

Event sent between two LS with Lumberjack with accentuated char, makes all field escaped. #2

Closed jordansissel closed 9 years ago

jordansissel commented 9 years ago

(This issue was originally filed by @plarivee at https://github.com/elastic/logstash/issues/1807)


A log line with an accentuated character messed up all tags and field.

Setup is host ( ls-forwarder) => logstash ( shipper ) => logstash ( receiver ) => redis

So here is the case :

On the server:

logstash-forwarder config: 
#################### 
{ 
  "network": { 
   "servers": [ "XXX.XXX.XXX.254:5043" ], 
   "ssl ca": "mycert.crt", 
   "timeout": 15 
  }, 

  "files": [ 
    { "paths": [ 
                "/var/log/syslog", 
        "/var/log/messages", 
        "/var/log/*.log" 
      ], 
      "fields": { "type": "syslog", "host": "app1.mydomain.com" } 
    } 
  ] 
} 
######################## 

Sending test log line::

app1: # logger TEST TEST TEST ÉÉÉÉÉÉÉ ééééééé 

Now on the Shipper receiving the log from the server:

Shipper config : ############### input { lumberjack { port => 5043 ssl_certificate => "mycert.crt" ssl_key => "mycert.key" add_field => { "domain" => "mydomain.com" "log-type" => "app-production" } tags => ["production"] } } output { lumberjack { hosts => "xxx.xxx.xxx.xxx" port => 5043 ssl_certificate => "theothercert.crt" codec => "json" } } ################

stdout debug ::

{ 
       "message" => "Aug 12 09:32:33 app1 root: TEST TEST TEST ÉÉÉÉÉÉÉ 
ééééééé", 
      "@version" => "1", 
    "@timestamp" => "2014-08-12T13:32:34.386Z", 
          "tags" => [ 
        [0] "production" 
    ], 
        "domain" => "mydomain.com", 
      "log-type" => "app-production", 
          "file" => "/var/log/syslog", 
          "host" => "app1.mydomain.com", 
        "offset" => "13833", 
          "type" => "syslog" 
} 

Now, the shipper sends it to the receiver : shipper => receiver ( lumberjack (logstash) => lumberjack (logstash) )

Receiver config :

################### 
input { 
  lumberjack { 
    port => 5043 
    ssl_certificate => "receiver01.crt" 
    ssl_key => "receiver01.key" 
        codec => "json" 
  } 
} 
output { 
  redis { host => ["XXX.XXX.XXX.101" ,"XXX.XXX.XXX.102" ] shuffle_hosts => 
true data_type => "list" key => "logstash"} 
} 
###################### 

stdout debug ::

{ 
       "message" => "{\"message\":\"Aug 12 09:32:33 app1 root: TEST TEST TEST ÉÉÉÉÉÉÉ ééééééé\",\"@version\":\"1\",\"@timestamp\":\"2014-08-12T13:32:34.386Z\",\"tags\":\"production\"],\"domain\":\"mydomain.com\",\"log-type\":\"app-production\",\"file\":\"/var/log/syslog\",\"host\":\"app1.mydomain.com\",\"offset\":\"13833\",\"t", 
          "@version" => "1", 
        "@timestamp" => "2014-08-12T13:32:35.008Z" 
} 

All fields / tags get their double quotes escaped and they are not treated like fields anymore and part of the message. If there is not 'é','É', 'à' etc... everything works #1. Tried without the codec config and with json_lines instead.

without the codec => json, I lose all fields / tags from the message with codec => json_lines , it's the same behavior of the fields getting escaped.

Looks like an encoding / accentuated char thing being not treated well.

ph commented 9 years ago

this was fixed and released