logstash-plugins / logstash-codec-csv

This codec takes care of decoding and encoding csv data
Apache License 2.0
3 stars 12 forks source link

Codec plugin not regenerating headers #6

Open thekofimensah opened 4 years ago

thekofimensah commented 4 years ago

I'm using ELK 7.6 and trying to export to an s3 bucket using the csv codec. So far things work, but there's an issue because I'm uploading files every 5 mins, but only the first "part" has the headers attached to it and ideally every part has the headers attached to it. Am I missing something? I've tried adding : codec => csv {include_headers => true } and codec => csv {include_headers => true autodetect_column_names=>true} but they aren't adding the headers after the first part is imported.

Thanks for the plugin btw!

ciurlaro commented 4 years ago

I am experiencing the very same problem.

My configuration is:

input {
  syslog { port => 42 }
}

filter {
  grok {
    match => {
      "message" => '\very_cool_regex\'
    }
  }
}
output {
  webhdfs {
    host => "very_cool_host"
    path => "/very/cool/path.csv"
    user => "hdfs"
    codec => csv {
                 autodetect_column_names => true
                 autogenerate_column_names => true
                 include_headers => true
             }
  }
}

But still csv generated files do not have headers.

yaauie commented 3 years ago

:thinking: There are two parts to this.

As reported, for encoding operations, the output needs to be able to instantiate a new codec per destination, because otherwise, the codec has no way of knowing that an encode operation is the first aimed at that destination (and therefore has no way of knowing that it needs to also output the headers).

Separately for decoding operations, even inputs that do instantiate a codec per source must not reuse the codecs, since a codec with autodetect_column_names enabled is stateful. For this problem, I have found a way to reliably hook the eviction of the codec from the file input's codec identity map, but need to come up with a more graceful solution: https://gist.github.com/yaauie/0bbcb0ef92c14c060c0d7175af1399fb