ruby / csv

CSV Reading and Writing
https://ruby.github.io/csv/
BSD 2-Clause "Simplified" License
177 stars 113 forks source link

CSV.filter(input, output, headers: true) does not write the header row to output! #308

Closed jeropaul closed 2 months ago

jeropaul commented 2 months ago

I'm having issues with the behaviour of filter and wanted to see if I'm missing something in docs. Certainly the observed behaviour does not seem to match the recipe, where the headers are being written to the output!

Goal: filter a CSV file using headers AND have the headers written to the output!

ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux] CSV::VERSION is "3.3.0"

irb(main):008> require 'csv'
=> true
irb(main):009> input = "Date and Time,Some other headers\n2024-06-01T13:59:59.9500000+10,othervalue"
=> "Date and Time,Some other headers\n2024-06-01T13:59:59.9500000+10,othervalue"
irb(main):010> output = ''
=> ""
irb(main):011> CSV.filter(input, output, headers: true) { |row| row['Date and Time'] = "#{row['Date and Time']}+HUH"}
=> nil
irb(main):012> output
=> "2024-06-01T13:59:59.9500000+10+HUH,othervalue\n"

So why is the header row not being written to the output?

kou commented 2 months ago

Could you add out_write_headers: true option?

jeropaul commented 2 months ago

No dice on that one, results in the header row being sent to the block, which results in the following:

irb(main):006> CSV.filter(input, output, headers: true, out_write_headers: true) { |row| row['Date and Time'] = "#{row['Date and Time']}+HUH"}
(irb):6:in `[]': no implicit conversion of String into Integer (TypeError)

CSV.filter(input, output, headers: true, out_write_headers: true) { |row| row['Date and Time'] = "#{row['Date and Time']}+HUH"}
                                                                                                        ^^^^^^^^^^^^^^^
    from (irb):6:in `block in <top (required)>'
    from /usr/local/bundle/gems/csv-3.3.0/lib/csv.rb:1231:in `filter'
    from (irb):6:in `<main>'
    from <internal:kernel>:187:in `loop'
    from /usr/local/bundle/gems/irb-1.12.0/exe/irb:9:in `<top (required)>'
    from /usr/local/bundle/bin/irb:25:in `load'
    from /usr/local/bundle/bin/irb:25:in `<top (required)>'
    from /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/cli/exec.rb:58:in `load'
    from /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/cli/exec.rb:58:in `kernel_load'
    from /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/cli/exec.rb:23:in `run'
    from /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/cli.rb:451:in `exec'
    from /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/vendor/thor/lib/thor/command.rb:28:in `run'
    from /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
    from /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/vendor/thor/lib/thor.rb:527:in `dispatch'
    from /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/cli.rb:34:in `dispatch'
    from /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/vendor/thor/lib/thor/base.rb:584:in `start'
    ... 6 levels...
kou commented 2 months ago

The first row is the header and it's an Array:

CSV.filter(input, output, headers: true, out_write_headers: true) { |row| row['Date and Time'] = "#{row['Date and Time']}+HUH" unless row.is_a?(Array)}
jeropaul commented 2 months ago

The first row is the header and it's an Array:

CSV.filter(input, output, headers: true, out_write_headers: true) { |row| row['Date and Time'] = "#{row['Date and Time']}+HUH" unless row.is_a?(Array)}

Okay I've found a unit test that describes this behaviour

The recipe really should be updated https://ruby.github.io/csv/doc/csv/recipes/filtering_rdoc.html#label-Recipe-3A+Filter+String+to+String+with+Headers

kou commented 2 months ago

Could you open a PR for it?