roo-rb / roo

Roo provides an interface to spreadsheets of several sorts.
MIT License
2.78k stars 503 forks source link

What's the deal with headers? #518

Open resistorsoftware opened 4 years ago

resistorsoftware commented 4 years ago

Thanks for filing an issue. Following these instructions will help us solve your problem sooner.

Issue

Header is UPC row 2 is 1234 row 3 is 4567

So I use a simple loop to get the data... and work with it.

sheet.parse(id: /UPC|SKU/i, clean: true) do |data|
  upc_count[data[:id]] += 1
end

I end up with UPC as a hash counted value... how do you setup the headers but not count them in the data? headers: false just blows up the looping as it cannot find the header then.

Hampei commented 3 years ago

this will work if you don't mind a full array being created in the middle

sheet.parse(id: /UPC|SKU/i, clean: true).each do |data|
  upc_count[data[:id]] += 1
end

This sadly has the same issue.

sheet.each(id: /UPC|SKU/i, clean: true).each do |data|
  upc_count[data[:id]] += 1
end
synth commented 1 year ago

I also would like to know what the deal with headers is. We recently upgraded from 2.6 to 2.10 and there is some unexpected behaviors happening.

  1. The upgrade from 2.6 -> 2.10 changes the behavior of #parse from including headers to not including headers
  2. If you specify .parse(headers: true) it returns a hash, not an array. This is behavior across both versions, but is also unexpected especially because the README says #parse is supposed to return an array of rows, not an array of hashes of rows and also differs depending on whether you specify headers or not.

UPDATE: Ok, I've traced it to this commit: https://github.com/roo-rb/roo/commit/dc94a6bc749db5a9287b52c7cd3d2bb544239b4b which drops the headers unless you are explicit to include them.

However, this also combines with: https://github.com/roo-rb/roo/blob/3fecab545b943962213cdf5f0cbe2cdb142940b7/lib/roo/base.rb#L282-L301 which for some reason decides to change the data type of what is returned (from array to hash) based on whether options are present...

So, if you want to retain headers, then you are forced in having a hash of rows returned.

My initial workaround was the map like so:

 Roo::Spreadsheet.open(file_path).parse(headers: true).map{ |row| row.values }

However, this may have performance implications and Roo supports sending a block so you can transform the Hash row to an array inline like:

 Roo::Spreadsheet.open(file_path).parse(headers: true) { |row| row.values }