salvois / LargeXlsx

A .net library to write large Excel files in XLSX format with low memory consumption using streamed write.
Other
204 stars 34 forks source link

Help Repaired Records: Cell information #49

Closed Richard-Woessner closed 5 months ago

Richard-Woessner commented 5 months ago

Hello, and thank you for this library, it works incredibly well but I am having an issue and don't know how to debug it

I am using excel for a large dataset roughly 1.2 million records.

The issue I'm having is that when I go to open the excel file, I get an error. Once I click ok, it loads up the Excel file fine, to my knowledge.

<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <logFileName>error084880_01.xml</logFileName>
    <summary>Errors were detected in file 'XXX/05 May Assets by Source.xlsx'</summary>
    <repairedRecords>
    <repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1.xml part</repairedRecord>
    </repairedRecords>
</recoveryLog>

I split the files into multiple by grouping some of the data together and pinpointed the issue to roughly 172000 records, which is still too many to notice anything out of the ordinary.

This is the method that I'm converting CSV to Excel

    public void ConvertToExcel(string? path, bool zip = false)
    {
        _logger.Log($"Converting {path} to Excel");

        string newPath = RemoveFileExtension(path) + ".xlsx";

        int valuesLength = 0;

        var compressionLevel = zip ? CompressionLevel.Level2 : CompressionLevel.Default;

        using var stream = new FileStream(newPath, FileMode.Create, FileAccess.Write);
        using (
            var xlsxWriter = new XlsxWriter(
                stream,
                useZip64: zip,
                requireCellReferences: false,
                compressionLevel: compressionLevel
            )
        )
        {
            using (var reader = new StreamReader(path))
            {
                int lineCount = 0;

                xlsxWriter.BeginWorksheet("Sheet1");
                while (!reader.EndOfStream)
                {
                    lineCount++;
                    if (lineCount % 20000 == 0)
                    {
                        _logger.Log($"Processed {lineCount} to excel");
                    }

                    string line = reader.ReadLine() ?? string.Empty;

                    if (string.IsNullOrEmpty(line))
                    {
                        continue;
                    }

                    string[] values = line.Split(',');

                    if (valuesLength == 0)
                    {
                        valuesLength = values.Length;
                    }

                    xlsxWriter.BeginRow();

                    foreach (string value in values)
                    {
                        if (double.TryParse(value, out double numericValue))
                        {
                            xlsxWriter.Write(numericValue);
                        }
                        else if (DateTime.TryParse(value, out DateTime dateValue))
                        {
                            xlsxWriter.Write(dateValue);
                        }
                        else if (bool.TryParse(value, out bool boolValue))
                        {
                            xlsxWriter.Write(boolValue);
                        }
                        else
                        {
                            xlsxWriter.Write(value.ToString());
                        }
                    }
                }
                xlsxWriter.SetAutoFilter(1, 1, lineCount - 1, valuesLength - 1);
            }
        }
    }

I would appreciate any help that could point me in the right direction. Thank you

Richard-Woessner commented 5 months ago

I fixed it. The issue was typical data typing, a few cells in a column were strings while most were numbers.

I just had to write to the Excel file as the proper datatype and it was fixed