logstash-plugins / logstash-filter-multiline

Apache License 2.0
18 stars 16 forks source link

Documentation is unclear what allow_duplicates actually does #15

Open bradvido opened 9 years ago

bradvido commented 9 years ago

I'm using logstash 1.5.0 on Windows.

This may just be a documentation improvement of what allow_duplicates really does because all the docs say currently is: Allow duplcate values on the source field.

I'm getting unexepected behavior having allow_duplicates => false

My filter block looks like this:

filter {
    ruby {
        code => "
            start_time_ms = (Time.now.to_f*1000).to_i

            event['logstash'] = {}
            event['logstash']['host'] = ENV['COMPUTERNAME'].dup.downcase

            event['logstash']['filter_processing_time'] = {}
            event['logstash']['filter_processing_time']['start_ms'] = start_time_ms
        "
    }

multiline{ 
    pattern => "\d\d:\d\d:\d\d .+?:( )\s.+"
    what => "previous"
    periodic_flush => true 
    allow_duplicates => false 
}

// a bunch of other filter plugins //

    ruby {  #measure how long the filter took to process
            code => "
                end_time_ms = (Time.now.to_f*1000).to_i
                event['logstash']['filter_processing_time']['end_ms'] = end_time_ms

                start_time_ms = event['logstash']['filter_processing_time']['start_ms']
                if(start_time_ms.kind_of?(Array)) #happens w/ multi-line inputs
                    start_time_ms = start_time_ms[0]
                end

                elapsed_ms = (end_time_ms - start_time_ms)
                event['logstash']['filter_processing_time']['elapsed_ms'] = elapsed_ms
            "
        }
}

The strange thing is that my logstash.host field is always a single value in the output, but the logstash.filter_processing_time.start_ms field is an array IFF the event had multiple lines.

I thought that with allow_duplicates => false I would solve the problem and not get an array of values for when each source document of the multiline filter has duplicate fields.

Sample output: When the event is multi-line:

{
  "logstash": {
      "host": "dev-util1",
      "filter_processing_time": {
        "start": [
          1433275212688,
          1433275212703
        ],
        "end": 1433275212719,
        "elapsed_ms": 31
      }
    }
}

When the event is single-line:

{
  "logstash": {
      "host": "dev-util1",
      "filter_processing_time": {
        "start": 1433275210641,
        "end": 1433275210656,
        "elapsed_ms": 15
      }
    }
}
johnarnold commented 8 years ago

+1