qax-os / excelize

Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets
https://xuri.me/excelize
BSD 3-Clause "New" or "Revised" License
17.62k stars 1.68k forks source link

Getting "expected element <workbook> in name space http://schemas.openxmlformats.org/spreadsheetml/2006/main but have no name space" FileOpen and SaveAs afterwards #1886

Closed kavu closed 2 months ago

kavu commented 2 months ago

Description

Hi! We have a problem with reading a file with Excelize, after making a copy of it with Excelize — we are getting expected element <workbook> in name space http://schemas.openxmlformats.org/spreadsheetml/2006/main but have no name space error. Detailed description below.

Steps to reproduce the issue:

Step zero:

We have a file generated by .NET solution https://github.com/ClosedXML/ClosedXML. Here is an example file (in our exact case it is much bigger, but that's doesn't matter here) - original2.xlsx. This file can be opened fine with Numbers.app, version 11.1 (7031.0.102) on macOS 14.2.1 (23C71).

Step one:

We have a small debug app to pinpoint the bug:

package main

import (
    "log"

    "github.com/xuri/excelize/v2"
)

func main() {
    f, openFileErr := excelize.OpenFile("original2.xlsx")
    if openFileErr != nil {
        log.Fatalf("OpenFile initial: %v", openFileErr)
    }

    if saveAsErr := f.SaveAs("corrupted.xlsx"); saveAsErr != nil {
        log.Fatalf("SaveAs: %v", openFileErr)
    }

    if closeErr := f.Close(); closeErr != nil {
        log.Fatalf("SaveAs: %v", openFileErr)
    }

    _, openFileAgainErr := excelize.OpenFile("corrupted.xlsx")
    if openFileAgainErr != nil {
        log.Fatalf("OpenFile again: %v", openFileAgainErr)
    }
}

If we run it we'll get an error:

2024/04/26 00:00:01 OpenFile again: expected element <workbook> in name space http://schemas.openxmlformats.org/spreadsheetml/2006/main but have no name space
exit status 1

Numbers.app also failing to open it:

image

Describe the results you received:

expected element <workbook> in name space http://schemas.openxmlformats.org/spreadsheetml/2006/main but have no name space

Describe the results you expected:

File opened without errors.

Output of go version:

go version go1.22.0 darwin/arm64

Excelize version or commit ID:

v2.8.2-0.20240425162310-055349d8a62e

Investingation:

Interesting thing, that this original2.xlsx has xl/workbook.xml with pretty interesting xmlns (providing only the very beginning of it, also formatted for the sake of readability):

<?xml version="1.0" encoding="utf-8"?>
<x:workbook xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
  xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:workbookPr codeName="ThisWorkbook" />
  <x:bookViews>
    <x:workbookView firstSheet="0" activeTab="0" />
  </x:bookViews>

So we have requires NS http://schemas.openxmlformats.org/spreadsheetml/2006/main but it is prefixed with x, so do all elements in file.

And after the saving this XML becomes this one (again, providing only the very beginning of it, also formatted for the sake of readability):

<?xml version="1.0" encoding="UTF-8"?>
<workbook xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
  xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <workbookPr codeName="ThisWorkbook"></workbookPr>
  <bookViews>
    <workbookView></workbookView>
  </bookViews>

Do you see have we still have or xmlns:x but all our elements have no such prefix? And there is no any sign of xmlns without prefix. That was my first though and I've decided to look at the part were Excelize saving the xl/workbook.xml.

And I found this part:

https://github.com/qax-os/excelize/blob/055349d8a62e6b4e66bcf3854c8a9086e912c409/workbook.go#L227-L229

which lead me to

https://github.com/qax-os/excelize/blob/055349d8a62e6b4e66bcf3854c8a9086e912c409/lib.go#L685-L692

replaceNameSpaceBytes in this case replaced usual xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" (which came from the XMLName field of the workbook structure) with original to namespaces, BUT we are also using the x prefixes in all elements. And that, as far as I understand, causes the errors.

I tried to remove part with finding the targetXmlns

    if attrs, ok := f.xmlAttr.Load(path); ok {
        targetXmlns = []byte(genXMLNamespace(attrs.([]xml.Attr)))
    }

and after that file started to open perfectly fine. Yes xl/workbook.xml will look awful, as it will have all possible XMLNs from the constant, but it works… But I still not sure if this is a proper fix.

Possible fix:

We probably need to always add xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main", unless it is already there. But that's just an idea, maybe you have a better solution for that.

kavu commented 2 months ago

@xuri can you take a peek on this one, really messes our system right now. Thank you in advance!

xuri commented 2 months ago

Thanks for your issue. I have fixed this, please upgrade to the master branch code, and this patch will be released in the next version. Note that, this library can't guarantee compatibility with workbooks generated by any third-party libraries and tools.