ruby / csv

CSV Reading and Writing
https://ruby.github.io/csv/
BSD 2-Clause "Simplified" License
177 stars 113 forks source link

Inconsistent behaviour between `CSV::Table.new` and `CSV.parse` #292

Closed Zopolis4 closed 9 months ago

Zopolis4 commented 9 months ago

Tables created by CSV::Table.new and CSV.parse behave differently when it comes to accessing values, to the extent that converting a table created by ::new gains additional functionality (when accessing values) by being converted into a string and then parsed by CSV.parse.

Additionally, a number of comparison operators return results that seem inconsistent with reality, at least when compared visually.

Testcase (of sorts) attached below:

require 'csv'

# Array of strings making up the equivalent of a 3-row CSV file
row0 = [ 'header0', 'header 1', 'Header 2' ]
row1 = [ 'r1value0', 'r1value2', 'r1value3' ]
row2 = [ 'r2value0', 'r2value2', 'r2value3' ]

# Array of rows made from the previous array of strings
rows = [
  CSV::Row.new(row0, [], header_row = true),
  CSV::Row.new([], row1),
  CSV::Row.new([], row2)
]

# Heredoc for that same CSV file
tablestring = <<-EOT
header0,header 1,Header 2
r1value0,r1value2,r1value3
r2value0,r2value2,r3value3
EOT

# Table made using CSV::Table.new on the array of rows
table1 = CSV::Table.new(rows, headers: row0)

# Table made using CSV.parse on the CSV heredoc
table2 = CSV.parse(tablestring, headers: true)

# Table made using the previous table converted toputs "table1.to_csv & tablestring: #{table1.to_csv == tablestring}" a CSV string (i.e. what should be equivalent to the CSV heredoc)
table3 = CSV.parse(table1.to_csv, headers: true)

puts "Print out the .to_csv form of each table"
puts "tablestring: \n#{tablestring}"
puts "table1.to_csv: \n#{table1.to_csv}"
puts "table2.to_csv: \n#{table2.to_csv}"
puts "table3.to_csv: \n#{table3.to_csv}"

puts "\n\nTest whether the .to_csv form of the tables is the same as the CSV heredoc (it should be)"
puts "table1.to_csv == tablestring: #{table1.to_csv == tablestring}"
puts "table2.to_csv == tablestring: #{table2.to_csv == tablestring}"
puts "table3.to_csv == tablestring: #{table3.to_csv == tablestring}"

puts "\n\nTest whether the headers of the tables are the same (they should be)"
puts "table1.headers & table2.headers: #{table1.headers == table2.headers}"
puts "table1.headers & table3.headers: #{table1.headers == table3.headers}"
puts "table2.headers & table3.headers: #{table2.headers == table3.headers}"

puts "\n\nAttempt to access the first column of each table"
puts "table1['header0']: #{table1['header0']}"
puts "table2['header0']: #{table2['header0']}"
puts "table3['header0']: #{table3['header0']}"

puts "\n\nAttempt to access the first row of each table"
puts "table1[0]: #{table1[0]}"
puts "table2[0]: #{table2[0]}"
puts "table3[0]: #{table3[0]}"

puts "\n\nTest whether each of the tables are the same"
puts "table1 == table2: #{table1 == table2}"
puts "table1 == table3: #{table1 == table3}"
puts "table2 == table3: #{table2 == table3}"

puts "\n\nPrint out each table"
puts "table1: \n#{table1}"
puts "table2: \n#{table2}"
puts "table3: \n#{table3}"

I am not sure how much of this behaviour is a bug as opposed to incorrect documentation or counterintuitive behaviour. However, the inability to access columns in tables created by CSV::Table.new is certainly very annoying, and while the workaround of converting those tables to csv strings then parsing them does work for that specific issue, it hardly seems an ideal solution.

$ ruby --version
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
$ gem list --local csv 
csv (3.2.8, 3.2.7, default: 3.2.2)
kou commented 9 months ago

Is it just happened by a typo?

--- /tmp/a.rb.orig  2023-11-25 07:39:48.942802182 +0900
+++ /tmp/a.rb   2023-11-25 07:39:59.030796716 +0900
@@ -16,7 +16,7 @@
 tablestring = <<-EOT
 header0,header 1,Header 2
 r1value0,r1value2,r1value3
-r2value0,r2value2,r3value3
+r2value0,r2value2,r2value3
 EOT
Zopolis4 commented 9 months ago

Ah. Yes, that certainly cuts down on a few of the issues. Some still remain though, I've attatched a testcase with the remaining ones.

require 'csv'

# Array of strings making up the equivalent of a 3-row CSV file
row0 = [ 'header0', 'header 1', 'Header 2' ]
row1 = [ 'r1value0', 'r1value2', 'r1value3' ]
row2 = [ 'r2value0', 'r2value2', 'r2value3' ]

# Array of rows made from the previous array of strings
rows = [
  CSV::Row.new(row0, [], header_row = true),
  CSV::Row.new([], row1),
  CSV::Row.new([], row2)
]

# Heredoc for that same CSV file
tablestring = <<-EOT
header0,header 1,Header 2
r1value0,r1value2,r1value3
r2value0,r2value2,r2value3
EOT

# Table made using CSV::Table.new on the array of rows
table1 = CSV::Table.new(rows, headers: row0)

# Table made using CSV.parse on the CSV heredoc
table2 = CSV.parse(tablestring, headers: true)

# Table made using the previous table converted toputs "table1.to_csv & tablestring: #{table1.to_csv == tablestring}" a CSV string (i.e. what should be equivalent to the CSV heredoc)
table3 = CSV.parse(table1.to_csv, headers: true)

puts "Test whether the headers of the tables are the same (they should be)"
puts "table1.headers & table2.headers: #{table1.headers == table2.headers}"
puts "table1.headers & table3.headers: #{table1.headers == table3.headers}"
puts "table2.headers & table3.headers: #{table2.headers == table3.headers}"

puts "\n\nAttempt to access the first column of each table"
puts "table1['header0']: #{table1['header0']}"
puts "table2['header0']: #{table2['header0']}"
puts "table3['header0']: #{table3['header0']}"

puts "\n\nAttempt to access the first row of each table"
puts "table1[0]: #{table1[0]}"
puts "table2[0]: #{table2[0]}"
puts "table3[0]: #{table3[0]}"

puts "\n\nTest whether each of the tables are the same"
puts "table1 == table2: #{table1 == table2}"
puts "table1 == table3: #{table1 == table3}"
puts "table2 == table3: #{table2 == table3}"

puts "\n\nPrint out each table"
puts "table1: \n#{table1}"
puts "table2: \n#{table2}"
puts "table3: \n#{table3}"
kou commented 9 months ago

Could you try the following?

rows = [
  CSV::Row.new(row0, row1),
  CSV::Row.new(row0, row2)
]
Zopolis4 commented 9 months ago

That gives me this result: image

kou commented 9 months ago

Here is my result:

$ cat /tmp/a.rb
require 'csv'

# Array of strings making up the equivalent of a 3-row CSV file
row0 = [ 'header0', 'header 1', 'Header 2' ]
row1 = [ 'r1value0', 'r1value2', 'r1value3' ]
row2 = [ 'r2value0', 'r2value2', 'r2value3' ]

# Array of rows made from the previous array of strings
rows = [
  CSV::Row.new(row0, row1),
  CSV::Row.new(row0, row2)
]

# Heredoc for that same CSV file
tablestring = <<-EOT
header0,header 1,Header 2
r1value0,r1value2,r1value3
r2value0,r2value2,r2value3
EOT

# Table made using CSV::Table.new on the array of rows
table1 = CSV::Table.new(rows, headers: row0)

# Table made using CSV.parse on the CSV heredoc
table2 = CSV.parse(tablestring, headers: true)

# Table made using the previous table converted toputs "table1.to_csv & tablestring: #{table1.to_csv == tablestring}" a CSV string (i.e. what should be equivalent to the CSV heredoc)
table3 = CSV.parse(table1.to_csv, headers: true)

puts "Test whether the headers of the tables are the same (they should be)"
puts "table1.headers & table2.headers: #{table1.headers == table2.headers}"
puts "table1.headers & table3.headers: #{table1.headers == table3.headers}"
puts "table2.headers & table3.headers: #{table2.headers == table3.headers}"

puts "\n\nAttempt to access the first column of each table"
puts "table1['header0']: #{table1['header0']}"
puts "table2['header0']: #{table2['header0']}"
puts "table3['header0']: #{table3['header0']}"

puts "\n\nAttempt to access the first row of each table"
puts "table1[0]: #{table1[0]}"
puts "table2[0]: #{table2[0]}"
puts "table3[0]: #{table3[0]}"

puts "\n\nTest whether each of the tables are the same"
puts "table1 == table2: #{table1 == table2}"
puts "table1 == table3: #{table1 == table3}"
puts "table2 == table3: #{table2 == table3}"

puts "\n\nPrint out each table"
puts "table1: \n#{table1}"
puts "table2: \n#{table2}"
puts "table3: \n#{table3}"
$ ruby -v /tmp/a.rb
ruby 3.3.0dev (2023-09-22T02:25:53Z master fb7a2ddb4b) [x86_64-linux]
Test whether the headers of the tables are the same (they should be)
table1.headers & table2.headers: true
table1.headers & table3.headers: true
table2.headers & table3.headers: true

Attempt to access the first column of each table
table1['header0']: ["r1value0", "r2value0"]
table2['header0']: ["r1value0", "r2value0"]
table3['header0']: ["r1value0", "r2value0"]

Attempt to access the first row of each table
table1[0]: r1value0,r1value2,r1value3
table2[0]: r1value0,r1value2,r1value3
table3[0]: r1value0,r1value2,r1value3

Test whether each of the tables are the same
table1 == table2: true
table1 == table3: true
table2 == table3: true

Print out each table
table1: 
header0,header 1,Header 2
r1value0,r1value2,r1value3
r2value0,r2value2,r2value3
table2: 
header0,header 1,Header 2
r1value0,r1value2,r1value3
r2value0,r2value2,r2value3
table3: 
header0,header 1,Header 2
r1value0,r1value2,r1value3
r2value0,r2value2,r2value3
Zopolis4 commented 9 months ago

Ah-- I had made a typo when using your suggested changes: fixing the typo gives me the same results as you.

kou commented 9 months ago

OK. Is it expected result? Do we still have any problem?

Zopolis4 commented 9 months ago

That leaves us here:

This doesn't work:

rows = [
  CSV::Row.new(row0, [], header_row = true),
  CSV::Row.new([], row1),
  CSV::Row.new([], row2)
]

This does:

rows = [
  CSV::Row.new(row0, row1),
  CSV::Row.new(row0, row2)
]

If we run the broken one through CSV.parse it does work.

To me it looks like the issue is that CSV.parse accepts things that CSV::Table.new will not work properly with.

I'm not sure whether that is intended behaviour or an issue, though.

Zopolis4 commented 9 months ago

Feel free to close this issue if this is intended behaviour.

kou commented 9 months ago

CSV.parse(table1.to_csv, headers: true) doesn't have any problem. Because it receives CSV text not CSV::Table.

If we want to avoid the wrong CSV::Table.new call, we can add a validation for the case. But I think that we don't need it for now.