zdavatz / spreadsheet

The Ruby Spreadsheet by ywesee GmbH
http://spreadsheet.ch
GNU General Public License v3.0
1.13k stars 240 forks source link

Incorrect detection of dimensions of a file #170

Closed jastkand closed 7 years ago

jastkand commented 8 years ago

The user have uploaded the file which was processed too much time. I was wondering what was wrong with the file and after some time of debugging found out that the Spreadsheet::Excel::Worksheet#recalculate_dimensions method returns incorrect values for that file.

It returns [0, 64883, 0, 4] but it should be [0, 2, 0, 3] instead.

Here is the file (I've updated it a little to remove sensitive data): https://dl.dropboxusercontent.com/u/15694871/file-with-wrong-dimensions.xls

Here is the code that I used to reproduce the issue:

require 'spreadsheet'

workbook = Spreadsheet.open('spec/fixtures/documents/file-with-wrong-dimensions.xls')
worksheet = workbook.worksheets.first

p worksheet.send(:recalculate_dimensions)

Can you tell me is there any way how to get the correct dimensions of the file?

P.S. As a workaround I added the code to stop the processing of the file when the number of empty rows in a row becomes 10:

require 'spreadsheet'

workbook = Spreadsheet.open('spec/fixtures/documents/file-with-wrong-dimensions.xls')
worksheet = workbook.worksheets.first

index = 0
previous_empty_row_index = 0
empty_rows_count = 0

rows = worksheet.each_with_object([]) do |row, result|
  if row.empty?
    break result if empty_rows_count >= 10

    empty_rows_count += 1 if previous_empty_row_index == index - 1

    previous_empty_row_index = index
  else
    result << row
  end

  index += 1
end

p rows.count
zdavatz commented 8 years ago

thank you, this is interesting. Please let me know if you have a patch for this.