zdavatz / spreadsheet

The Ruby Spreadsheet by ywesee GmbH
http://spreadsheet.ch
GNU General Public License v3.0
1.13k stars 240 forks source link

Complex Excel Structure -> Cells location jumbled upon saving #259

Closed sergio-rivas closed 3 years ago

sergio-rivas commented 3 years ago

When parsing a complex structure excel file (i.e. lots of merges, etc), then modifying a few specific cells, the output file changes the cell locations of many unrelated cells around the sheet (both row/column position are being shifted for other cells).

Is there anything I should provide to help you be able to replicate the issue?

zdavatz commented 3 years ago

what software created the original file?

sergio-rivas commented 3 years ago

The file was created with Windows version of Microsoft Excel, it's a .xls file too. I think it might have to do with the fact that the file has many use of merged cells.

I tried two simple tests to create an easy replication:
1) open the file and use spreadsheet to write to new file -> no problem 2) open the file and use spreadsheet to edit only one cell -> issue replicated

Would I be able to privately send the XLS file so you can confirm the issue? I don't mind helping out by sponsoring the bugfix too. Let me know how much.

zdavatz commented 3 years ago

Please try to open and save the file with LibreOffice and then try to edit it again using spreadsheet. Does that work?

sergio-rivas commented 3 years ago

After saving w/ LibreOffice, now spreadsheet gem cannot read the excel.
Error:

Spreadsheet::Errors::UnknownCodepage: Unknown Codepage 0xac8d
from .../ruby/gems/2.6.0/gems/spreadsheet-1.2.7/lib/spreadsheet/excel/reader.rb:75:in `block in encoding'
zdavatz commented 3 years ago

Did you save the file as UTF-8? Which version of LibreOffice and which OS are you using?

sergio-rivas commented 3 years ago

LibreOffice 7.1.0.3, Mac OSX.

There is no option to choose UTF-8 when saving as xls. The resulting file can be opened in LibreOffice, but is not readable by spreadsheet gem. Can I email you the original file and you try to open/save with your libre office version, then read it in spreadsheet gem? This way we can eliminate the possibility of environmental factors?

Screen Shot 2021-03-02 at 1 23 08 AM
zdavatz commented 3 years ago

Please share the Ruby script with which you want to open the file. Also look at this commit here: https://github.com/zdavatz/spreadsheet/blob/master/History.md#111--03012016 where we had a similar error. Maybe you can send a Pull Request. This is the commit: e59b3374eefcf1fa8c7e75e31a3b36365f2078a8

sergio-rivas commented 3 years ago

I tried digging into the code myself, but was unable to get anywhere. Is there a way to know exactly what encoding LibreOffice is using when saving?

I tried setting the unknown codepage to MACROMAN, but would get invalid byte sequence errors, I tried UTF-8, and didn't get invalid byte sequence errors, but in BIFF8, it kept encountering a lot of errors due to nil values here and there.

zdavatz commented 3 years ago

For me the question is which language/character uses that "strange" characterset in your XLS?

sergio-rivas commented 3 years ago

It's all alphanumeric characters. Language is set to English (US), I changed all items to font "Times New Roman"

I even tried running a macros to enforce all alphanumeric characters only.

REM  *****  BASIC  *****
Function AlphaNumericOnly(strSource As String) As String
    Dim i As Integer
    Dim strResult As String

    For i = 1 To Len(strSource)
        Select Case Asc(Mid(strSource, i, 1))
            Case 32 To 32, 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space
                strResult = strResult & Mid(strSource, i, 1)
        End Select
    Next
    AlphaNumericOnly = strResult
End Function

Function CleanSheet(sheetName As String) As Boolean
    Dim sh As Object, z As Object, c As Object
    Dim qCells As Object, enuCells As Object
    sh = ThisComponent.Sheets.getByName(sheetName)
    z = sh.getCellRangeByName("A1:U300")
    qCells = z.queryContentCells(-1)
    enuCells = qCells.Cells.createEnumeration
    Do While enuCells.hasMoreElements
      c = enuCells.nextElement
      If Left(c.Formula, 1) = "=" Then 

      Else
        c.string = AlphaNumericOnly(c.String)
      End If  
    Loop
    CleanSheet = true
End Function

Sub CleanAll()
    CleanSheet("GRID")
    CleanSheet("RESULTS")
    CleanSheet("PICTURES")
End Sub
zdavatz commented 3 years ago

Which Windows Version and which MS Excel Version created the orginial file? Can these people try to create the same file using LibreOffice on Windows?

sergio-rivas commented 3 years ago

After some messing around with different settings, the following combination fixed my issue:

  1. Uncheck "Protect Sheet"
  2. Uncheck "Protect Spreadsheet Structure"
Screen Shot 2021-03-13 at 1 43 52 PM
zdavatz commented 3 years ago

Ok, good to know, thank you for reporting!