logstash-plugins / logstash-filter-csv

Apache License 2.0
15 stars 41 forks source link

whitespace causes parse failure with "Illegal quoting in line" error #44

Open PhaedrusTheGreek opened 8 years ago

PhaedrusTheGreek commented 8 years ago

When spaces are present between quoted entries, trailing or leading the CSV data, parse failure occurs.

Seems to happen in all versions of the plugin.

input {
 stdin {}
}

filter {
 csv {
 columns => [ "screentype", "devicetype" ]
 }
}

output {
 stdout {
  codec => rubydebug {}
 }
}

Note the space before the line in the 2nd record, and the space between fields in the 3rd record

2016-11-29T11:54:39,123][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
"test1","test2"
{
    "@timestamp" => 2016-11-29T16:54:45.725Z,
      "@version" => "1",
    "screentype" => "test1",
       "message" => "\"test1\",\"test2\"",
    "devicetype" => "test2"
}
 "test1","test2"
[2016-11-29T11:54:57,222][WARN ][logstash.filters.csv     ] Error parsing csv {:field=>"message", :source=>" \"test1\",\"test2\"", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}
{
    "@timestamp" => 2016-11-29T16:54:56.479Z,
      "@version" => "1",
       "message" => " \"test1\",\"test2\"",
          "tags" => [
        [0] "_csvparsefailure"
    ]
}
"asdf3",   "234"
[2016-11-29T12:00:18,634][WARN ][logstash.filters.csv     ] Error parsing csv {:field=>"message", :source=>"\"asdf3\", \"234\"", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}
{
    "@timestamp" => 2016-11-29T17:00:17.933Z,
      "@version" => "1",
       "message" => "\"asdf3\", \"234\"",
          "tags" => [
        [0] "_csvparsefailure"
    ]
}
jordansissel commented 8 years ago

I don't' think there's any practical standard for what constitutes "CSV" but my rough understanding was that fields were comma-delimited and sometimes quotes were used for values. Spaces outside of values seems invalid, to me.

If I look outward a bit, what should the expected parsing result be for the following probably-not-valid-csv:

jordansissel commented 8 years ago

I tested loading csv into LibreCalc with a few variants of "one","two","three" with spaces in various places, and things seem like they are loaded successfully.

jordansissel commented 8 years ago

After research above, I am agreeing this is a bug. We use the Ruby standard library CSV parser for this filter, and I don't see any mechanism in the CSV library to make it work with the whitespace-filled data you provide. This means we'll probably have to find (or write) a replacement library. I have no ETA on that effort.

pavelnikolov commented 7 years ago

I have the same issue and I have no idea how to fix it

SHSauler commented 6 years ago

I can third this issue. Is there a workaround?

kikaragyozov commented 3 years ago

The original spec (RF-4180) from 2005 doesn't mention what to do with this case, but a draft case of 2016 or so suggests we just trim any leading/trailing white-spaces outside a quote segment.

Source

@jordansissel can this get a fix now in that direction?