uken / fluent-plugin-elasticsearch

Apache License 2.0
891 stars 310 forks source link

json logs are not sent to ES after parser filter with json format #521

Open shipilovds opened 5 years ago

shipilovds commented 5 years ago

Problem

json logs are not sent to ES after parser filter with json format.

I see the following messages in fluentd logs:

2018-12-26 13:55:56 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch [error type]: mapper_parsing_exception [re
ason]: 'failed to parse [msg]'" location=nil tag="docker.test.msgs" time=2018-12-26 13:55:21.233044973 +0000 record={"level"=>"info", "ts"=>1545751556.9878204, "msg"=>"Started", "version"=>"HEAD", "build_date"=>"2
018-12-12T14:31:50+0000"}                          

@log_level debug enabled for elasticsearch plugin

Steps to replicate

fluentd Dockerfile:

FROM debian:buster-slim
LABEL Description="Fluentd docker image" Vendor="MyTeam" Version="1.0"
#ENV GOSU_VERSION=1.10-1+b2

# Do not split this into multiple RUN!
# Docker creates a layer for every RUN-Statement
# therefore an 'apt-get purge' has no effect
RUN apt-get update \
 && apt-get upgrade -y \
 && apt-get install -y --no-install-recommends \
            ca-certificates \
            ruby \
            gosu \
            #=${GOSU_VERSION} \
 && buildDeps=" \
      make gcc g++ libc-dev \
      ruby-dev \
      wget bzip2 \
    " \
 && apt-get install -y --no-install-recommends $buildDeps \
 && update-ca-certificates \
 && echo 'gem: --no-document' >> /etc/gemrc \
 && gem install oj \
 && gem install json \
 && gem install fluentd -v 1.3.0 \
 && fluent-gem install fluent-plugin-elasticsearch \
 && wget -O /tmp/jemalloc-4.5.0.tar.bz2 https://github.com/jemalloc/jemalloc/releases/download/4.5.0/jemalloc-4.5.0.tar.bz2 \
 && cd /tmp && tar -xjf jemalloc-4.5.0.tar.bz2 && cd jemalloc-4.5.0/ \
 && ./configure && make \
 && mv lib/libjemalloc.so.2 /usr/lib \
 && apt-get purge -y --auto-remove \
                  -o APT::AutoRemove::RecommendsImportant=false \
                  $buildDeps \
 && rm -rf /var/lib/apt/lists/* \
 && rm -rf /tmp/* /var/tmp/* /usr/lib/ruby/gems/*/cache/*.gem

RUN useradd fluent -d /home/fluent -m -U
RUN chown -R fluent:fluent /home/fluent

# for log storage (maybe shared with host)
RUN mkdir -p /fluentd/log
# configuration/plugins path (default: copied from .)
RUN mkdir -p /fluentd/etc /fluentd/plugins

RUN chown -R fluent:fluent /fluentd

USER fluent
WORKDIR /home/fluent

# Tell ruby to install packages as user
RUN echo "gem: --user-install --no-document" >> ~/.gemrc
ENV PATH /home/fluent/.gem/ruby/2.5.0/bin:$PATH
ENV GEM_PATH /home/fluent/.gem/ruby/2.5.0:$GEM_PATH

ENV FLUENTD_OPT=""
ENV FLUENTD_CONF="fluent.conf"

ENV LD_PRELOAD="/usr/lib/libjemalloc.so.2"

EXPOSE 24224 5140

CMD exec fluentd -c /fluentd/etc/$FLUENTD_CONF -p /fluentd/plugins $FLUENTD_OPT                                                                                                                                                                                                             

fluentd conf:

<source>
  @type forward
  port 24224
</source>

<filter docker.**>
  @type parser
  @log_level debug
  format json # apache2, nginx, etc...
  key_name log
#  reserve_data true
</filter>

<match fluent.**>
  @type null
</match>

<match **>
  @log_level debug
  @type elasticsearch
  logstash_format true
  host example_es_host
  user example_es_user
  password example_es_pass
  index_name fluentd
  type_name fluentd
</match>

And service that echo json to stdout every N seconds with docker-compose file:

version: '2'
services:
  test_json:
    container_name: test.msgs
    image: test_json:latest
    logging:
      driver: fluentd
      options:
        fluentd-address: fluentd
        tag: docker.test.msgs

fluentd service and test_json are at the same network.

string with json:

{"level":"info","ts":1545751556.9878204,"msg":"Started","version":"HEAD","build_date":"2018-12-12T14:31:50+0000"}

Expected Behavior or What you need to ask

I want to see formatted logs in kibana (ES as storage). json log must be parsed and inserted in ROOT of _source. But instead I get an error and the log is not sent. If I remove "msg":"Started", from log string, I can't see any warning messages in fluentd log, but I can't see test logs in kibana too. I tried the different options and nothing helped :(( Documentation says it should work. Tell me if the above is not enough. P.S. Yes, I've seen this before: https://github.com/uken/fluent-plugin-elasticsearch/issues/320 And it doesn't help. Not the same. P.P.S. Sorry for my english. I'm not native speaker.

Using Fluentd and ES plugin versions

cosmo0920 commented 5 years ago

Thank you for your report. It is very good report. :100: I'm glad to receive such detailed and no shortage information to reproduce this issue.

Sorry for my english. I'm not native speaker.

No problem. I'm also not a native speaker.

cosmo0920 commented 5 years ago

Your assuming log is the following style (ToString()-ed JSON object within log field)?

{"source":"stdout","log":"{\"level\":\"info\",\"ts\":1545751556.9878204,\"msg\":\"Started\",\"version\":\"HEAD\",\"build_date\":\"2018-12-12T14:31:50+0000\"}","container_name":"/w1"}

If not so, parser plugin does not work....

shipilovds commented 5 years ago

Yes, JSON field in kibana is as follows:

{
  "_index": "logstash-2018.12.27",
  "_type": "fluentd",
  "_id": "xvkr72cBchWqobSTatzm",
  "_version": 1,
  "_score": null,
  "_source": {
    "container_name": "/my_container",
    "source": "stderr",
    "log": "{\"level\":\"info\",\"ts\":1545905775.6339545,\"msg\":\"Started\",\"version\":\"HEAD\",\"build_date\":\"2018-12-12T14:31:50+0000\"}",
    "container_id": "f52c73fed5b2360d23b73d3c72d84cb6fefe0581996a01da9aa717c909c9ffd0",
    "@timestamp": "2018-12-27T10:16:15.000000000+00:00"
  },
...
}
cosmo0920 commented 5 years ago

Umm..., I guess that your json like log is not valid JSON....

How about using the following configuration?

<filter docker.**>
  @type parser
  @log_level debug
  <parse>
    @type multi_format
    <pattern>
      format json
    </pattern>
    <pattern>
      format none
    </pattern>
  </parse>
  key_name log
#  reserve_data true
</filter>

Note that the above configuration needs to install https://github.com/repeatedly/fluent-plugin-multi-format-parser.

shipilovds commented 5 years ago

Umm..., I guess that your json like log is not valid JSON....

How about using the following configuration?

<filter docker.**>
  @type parser
  @log_level debug
  <parse>
    @type multi_format
    <pattern>
      format json
    </pattern>
    <pattern>
      format none
    </pattern>
  </parse>
  key_name log
#  reserve_data true
</filter>

Note that the above configuration needs to install https://github.com/repeatedly/fluent-plugin-multi-format-parser.

Thank you! I will try this. About "not valid JSON": same logs being processed by logstash. Problem is not with validity, I think.

cosmo0920 commented 5 years ago

About "not valid JSON": same logs being processed by logstash. Problem is not with validity, I think.

Hmm.... Logstash uses Jackson JRuby binding, instead, our JSON deserializer is Yajl CRuby binding.

nizarayari commented 5 years ago

I have the same issue. Any update on this please?

husseinraoouf commented 5 years ago

+1

aspekt112 commented 5 years ago

Any progress?

shipilovds commented 5 years ago

Sorry, guys. Project with this issue is closed some time ago, I didn`t make research after. I have nothing to say. If I meet such problem again - I will tell you.