nissl-lab / npoi

a .NET library that can read/write Office formats without Microsoft Office installed. No COM+, no interop.
Apache License 2.0
5.73k stars 1.43k forks source link

SXSSFWorkbook generated xlsx file cannot be import by PowerBI #1443

Open msmith-work opened 2 days ago

msmith-work commented 2 days ago

NPOI Version

Current

File Type

Reproduce Steps

Using streaming SXSSFWorkbook, generate a simple spreadsheet output

Issue Description

The generated xlsx file contains a stray UTF8 BOM (ZWNBSP) character in xl\worksheets\sheet1.xml between <sheetData><row r="1">.

The file can be opened without issue in Excel, however importing this file into PowerBI results in the error message "'Text' is an invalid XmlNodeType, Line 1, position x"

On investigation, it appears the temp file created for streaming is done so using a UTF8 encoding (SheetDataWriter.cs constructor), this then forces the temp file to contain the UTF8 BOM character, which then appears to be inserted into the final spreadsheet just after the opening <sheetData> tag

tonyqus commented 2 days ago

Can you upload the generated xlsx file? And How do you import this xlsx into PowerBI? Please let me know the reproduction steps.

msmith-work commented 1 day ago

Here is some basic example code and the generated spreadsheet (it affects all spreadsheets)

public Stream BuildExampleReport()
{
    using var workbook = new SXSSFWorkbook();
    var worksheet = workbook.CreateSheet("Sheet 1");
    for (var rowNum = 0; rowNum < 3; rowNum++)
    {
        var row = worksheet.CreateRow(rowNum);
        for (var colNum = 0; colNum < 3; colNum++)
        {
            var cell = row.CreateCell(colNum);
            cell.SetCellValue("Example");
        }
    }
    var memory = new MemoryStream();
    workbook.Write(memory, true);
    memory.Position = 0;
    return memory;
}

example.xlsx

The stray UTF8 BOM character is within the xl\worksheets\sheet1.xml file just after the opening <sheetData> tag.

As for importing into PowerBI, this is a process that our customers do with the generated spreadsheets and is not something we do ourselves; it has now been reported by multiple customers, each customer with their own use cases and processes.

We were previously using the non-streaming XSSF functions to generate the files, and these were generated without issue, it is only since switching to the streaming SXSSF functions that the issue has started.