A question about consolidated timestamps from different sources

(check apply)

[x] read the contribution guideline
[ ] (optional) already reported 3rd party upstream repository or mailing list if you use k8s addon or helm charts.

Problem

Hi,

We are running a docker swarm and I'm working on implementing log shipping to ES+Kibana. Fluentd also runs as a service in the swarm. Individual docker services will be redirecting their logs to fluentd, each service has its own fluentd tag.

I will probably end up using logstash_format, but as an exercise, I'm trying to consolidate all timestamps from different logs into a single timestamp field to be used by Kibana as a time field on an index pattern.

I'm currently focusing on nginx logs. I've configured access logs to be in JSON format:

 log_format main escape=json '{'
    '"remote_addr": "$remote_addr",'
    '"remote_user": "$remote_user",'
    '"timestamp": "$time_iso8601",'
    '"request": "$request",'
    '"request_time": $request_time,'
    '"status": $status,'
    '"body_bytes_sent": $body_bytes_sent,'
    '"http_referer": "$http_referer",'
    '"http_user_agent": "$http_user_agent",'
    '"ssl_protocol": "$ssl_protocol",'
    '"ssl_cipher": "$ssl_cipher"'
  '}';

Nginx error logs are not configurable, so we need to parse them using a regexp.

Steps to replicate

This is our current configuration:

<system>
  log_level debug
</system>

<source>
  @type forward
  port  24224
  bind  0.0.0.0
</source>

<match nginx>
  @type copy
  <store>
    @type  relabel
    @label @nginx
  </store>
</match>

<match **>
  @type stdout
</match>

<label @nginx>
  # Split nginx access and error log entries.
  <match nginx>
    @type rewrite_tag_filter
    <rule>
      key     log
      pattern /^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}.*$/i
      tag     ${tag}.error
    </rule>
    <rule>
      key     log
      pattern .*
      tag     ${tag}.access
    </rule>
  </match>

  <filter nginx.access>
    @type        parser
    key_name     log
    reserve_data true
    <parse>
      @type    json
    </parse>
  </filter>

  <filter nginx.error>
    @type        parser
    key_name     log
    reserve_data true
    <parse>
      @type       regexp
      expression  /^(?<logtime>\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}) \[(?<log_level>[^\s]+)\] (?<message>.*)$/
      time_key    logtime
      time_type   string
      time_format "%Y/%m/%d %H:%M:%S"
      keep_time_key true
    </parse>
  </filter>

  <match nginx.access>
      @type copy
      <store>
          @type  relabel
          @label @output
      </store>
  </match>

  <match nginx.error>
      @type copy
      <store>
          @type  relabel
          @label @output
      </store>
      <store>
          @type  relabel
          @label @output_stdout
      </store>
  </match>
</label>

<label @output>
  <match **>
    @type              elasticsearch
    host               <host>
    port               <port>
    user               <user>
    password           <password>
    index_name         fluentd.%Y%m%d
    logstash_format    false
    suppress_type_name true
    include_timestamp  false
    include_tag_key    true

    <buffer tag, time>
      @type        memory
      timekey      1m
      timekey_wait 5s
      flush_interval 5s
    </buffer>
  </match>
</label>

<label @output_stdout>
  <match **>
    @type stdout
  </match>
</label>

Expected Behavior or What you need to ask

The above configuration parses log timestamps in both logs fine and they are recognised as date types. nginx $time_iso8601 variable is formatted like this: 2022-06-14T07:30:53+00:00. Error log timestamp is formatted like this: 2022/06/14 07:41:19.

Then I realised I end up with two different log time fields: timestamp (parsed from nginx access log entries) and logtime (parsed from nginx error log entries). Kibana requires a single timestamp field to be selected for time-based index patterns.

Trivial, I thought, I'll just rename regexp group logtime to timestamp when parsing nginx error log. The problem is that Elasticsearch then rejects nginx error log entries with:

dump an error event: error_class=Fluent::Plugin::ElasticsearchErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch [error type]: mapper_parsing_exception [reason]: 'failed to parse [timestamp]'" location=nil tag="nginx.error" time=2022-06-14 07:41:20.000000000 +0000 record={....}

I can see timestamp field in the offending record, formatted as 2022/06/14 07:41:20. If I use any other field name (like logtime), the parsed data is the same, ES accepts entries, and the field shows up as a date field in Kibana.

Is timestamp a reserved/special field name in ES? nginx.access entries with timestamp field are accepted by ES, with the format 2022-06-14T07:30:53+00:00.

Is the timestamp issue even the plugin issue, or a fluentd issue?

Thank you!

Using Fluentd and ES plugin versions

Docker image based off the official fluent/fluentd:v1.14-1 Docker image (we replace ES libs with v7.7 and install this plugin).

$ fluentd --version
fluentd 1.14.6

$ fluent-gem list
*** LOCAL GEMS ***

async (1.30.1)
async-http (0.54.0)
async-io (1.33.0)
async-pool (0.3.9)
benchmark (default: 0.1.0)
bigdecimal (1.4.4)
bundler (2.3.11)
cgi (default: 0.1.0.1)
concurrent-ruby (1.1.10)
console (1.15.0)
cool.io (1.7.1)
csv (default: 3.1.2)
date (default: 3.0.3)
dbm (default: 1.1.0)
delegate (default: 0.1.0)
did_you_mean (default: 1.4.0)
elasticsearch (7.7.0)
elasticsearch-api (7.7.0)
elasticsearch-transport (7.7.0)
etc (default: 1.1.0)
excon (0.92.3)
ext_monitor (0.1.2)
faraday (1.10.0)
faraday-em_http (1.0.0)
faraday-em_synchrony (1.0.0)
faraday-excon (1.1.0)
faraday-httpclient (1.0.1)
faraday-multipart (1.0.4)
faraday-net_http (1.0.1)
faraday-net_http_persistent (1.2.0)
faraday-patron (1.0.0)
faraday-rack (1.0.0)
faraday-retry (1.0.3)
fcntl (default: 1.0.0)
fiber-local (1.0.0)
fileutils (default: 1.4.1)
fluent-config-regexp-type (1.0.0)
fluent-plugin-elasticsearch (5.2.2)
fluent-plugin-rewrite-tag-filter (2.4.0)
fluentd (1.14.6)
forwardable (default: 1.3.1)
getoptlong (default: 0.1.0)
http_parser.rb (0.8.0)
ipaddr (default: 1.2.2)
irb (default: 1.2.6)
json (2.4.1)
logger (default: 1.4.2)
matrix (default: 0.2.0)
msgpack (1.5.1)
multi_json (1.15.0)
multipart-post (2.2.0)
mutex_m (default: 0.1.0)
net-pop (default: 0.1.0)
net-smtp (default: 0.1.0)
nio4r (2.5.8)
observer (default: 0.1.0)
oj (3.10.18)
open3 (default: 0.1.0)
openssl (default: 2.1.3)
ostruct (default: 0.2.0)
prime (default: 0.1.1)
protocol-hpack (1.4.2)
protocol-http (0.21.0)
protocol-http1 (0.13.2)
protocol-http2 (0.14.2)
pstore (default: 0.1.0)
psych (default: 3.1.0)
racc (default: 1.4.16)
readline (default: 0.0.2)
readline-ext (default: 0.1.0)
reline (default: 0.1.5)
rexml (default: 3.2.3.1)
rss (default: 0.2.8)
ruby2_keywords (0.0.5)
serverengine (2.2.5)
sigdump (0.2.4)
singleton (default: 0.1.0)
stringio (default: 0.1.0)
strptime (0.2.5)
strscan (default: 1.0.3)
timeout (default: 0.1.0)
timers (4.3.3)
tracer (default: 0.1.0)
tzinfo (2.0.4)
tzinfo-data (1.2022.1)
uri (default: 0.10.0)
webrick (default: 1.6.1)
yajl-ruby (1.4.2)
yaml (default: 0.1.0)
zlib (default: 1.1.0)

ES version: 5.4.1

uken / fluent-plugin-elasticsearch