ruby / csv

CSV Reading and Writing
https://ruby.github.io/csv/
BSD 2-Clause "Simplified" License
178 stars 113 forks source link

Parser throws "TODO: Meaningful message" rather than a meaningful error #222

Closed unikitty37 closed 2 years ago

unikitty37 commented 2 years ago

When I try processing the attached CSV file, CSV.foreach throws a very unhelpful error:

image

  def import_file(csv_file, &block)
    CSV.foreach(csv_file, headers: column_names, strip: true, return_headers: false) do |row|
      block.yield row # CSV.foreach throws before reaching this point
    end
  end

  def column_names
    %w[name address1 address2 address3 city county postcode contact_phone contact_name contact_mobile emails fax]
  end

contractor_validation_test.csv

This is occurring with Ruby 2.7, but it still appears to be in the current codebase.

What causes this, and would it be possible to change the error message for something a bit more helpful?

olleolleolle commented 2 years ago

Here's a look at the data in the file:

bild
- starts with a BOM https://unicode-table.com/en/FEFF/ - the column_names don't have the same casing as the .csv file column headers - there seems to be a few undescribed columns (a list of commas after the named columns)
unikitty37 commented 2 years ago

Right — should it not be handling a BOM, though? This file is exactly as Excel saved it.

Even if I specifically tell it that the input file has one, by adding encoding: 'rb:BOM|UTF-8', to the options for foreach, I get the same message.

olleolleolle commented 2 years ago

@unikitty37 Would it be more pedagogical to offer:

unikitty37 commented 2 years ago

Ah, is this indicating that I have misunderstood the documentation on the headers option? My understanding was that it would allow me to refer to that column as row[:address1] regardless of what the sender has put in the header column (we can't rely on uploaders using consistent column names, unfortunately).

The Data Conversion section of the doc seems to imply this, too:

# Headers are part of data
data = CSV.parse(<<~ROWS, headers: true)
  Name,Department,Salary
  Bob,Engineering,1000
  Jane,Sales,2000
  John,Management,5000
ROWS

data.class      #=> CSV::Table
data.first      #=> #<CSV::Row "Name":"Bob" "Department":"Engineering" "Salary":"1000">
data.first.to_h #=> {"Name"=>"Bob", "Department"=>"Engineering", "Salary"=>"1000"}

# Headers provided by developer
data = CSV.parse('Bob,Engineering,1000', headers: %i[name department salary])
data.first      #=> #<CSV::Row name:"Bob" department:"Engineering" salary:"1000">

Or does this only work with CSV.parse and not CSV.foreach?

olleolleolle commented 2 years ago

(My guesses are bottoming out, and I encourage you to continue researching this!)

kou commented 2 years ago

This is already fixed in the latest version. Could you add gem "csv" to your Gemfile?

unikitty37 commented 2 years ago

Thanks — unfortunately this doesn't change the installed version from 1.0.0.

I suspect the codebase still being on Rails 4 with Ruby 2.5.7 is the cause of this, so I'll have to work around the issue until we can get the upgrade project finished…