salvois / LargeXlsx

A .net library to write large Excel files in XLSX format with low memory consumption using streamed write.
Other
204 stars 34 forks source link

Generated *.xlsx not always deemed as proper excel files. #17

Closed natalie-o-perret closed 9 months ago

natalie-o-perret commented 1 year ago

Hi 🙋‍♀️,

I'm using your awesome LargeXlsx library (version 1.6.3) with .NET 7.0.101 to generate SAP journal entries files that are then uploaded to SAP (accordingly to their their doc).

Expand / Collapse the C# implementation `AccrualsGenerator.cs`: ```csharp using LargeXlsx; using SharpCompress.Compressors.Deflate; namespace ExcelMeh; public static class AccrualsGenerator { public static void Generate(string path) { using var stream = new FileStream(path, FileMode.Create, FileAccess.Write); using var xlsxWriter = new XlsxWriter(stream, CompressionLevel.BestCompression, true); xlsxWriter.SetDefaultStyle(Constants.Styles.Default) .BeginWorksheet(Constants.Values.WorksheetName, columns: Constants.ColumnsMin) .BeginRow() .Write(Constants.Values.Title, Constants.Styles.Title) .BeginRows(Constants.Values.Comments, Constants.Styles.Bold) .BeginRow() .Write(Constants.Values.BatchId, Constants.Styles.DarkYellowBg) .Write(Constants.Styles.DarkYellowBg) .SkipRows(2) .BeginRow(style: Constants.Styles.DarkBlueBoldBgLeft) .Write(1, Constants.Styles.DarkBlueBoldBg) .WriteSharedString(Constants.Values.Header, Constants.Styles.DarkBlueBoldBg) .BeginRow() .SkipColumns(1).WriteSharedStrings(Constants.Values.HeaderColumNamesMin, Constants.Styles.LightBlueBg) .BeginRow() .SkipColumns(1).WriteSharedStrings(Constants.Values.HeaderColumDescriptionsMin, Constants.Styles.LightBlueBg) .BeginRow() .SkipColumns(1) .Write("FR01") .Write("AD") .Write(new DateTime(2023, 01, 01), Constants.Styles.ShortDate) .Write(new DateTime(2023, 01, 01), Constants.Styles.ShortDate) .Write("MERC11-1111444-55500") .Write("EUR") .SkipRows(1) .BeginRow() .SkipColumns(1) .WriteSharedString(Constants.Values.LineItems, Constants.Styles.BlueBg) .BeginRow() .SkipColumns(4) .WriteSharedString(Constants.Values.TransactionCurrency, Constants.Styles.LightBlueBg, 2) .BeginRow() .SkipColumns(1) .WriteSharedStrings(Constants.Values.LineItemColumnNamesMin, Constants.Styles.LightBlueBg) .BeginRow() .SkipColumns(1) .WriteSharedStrings(Constants.Values.LineItemColumnDescriptionsMin, Constants.Styles.LightBlueBg) .BeginRow() .SkipColumns(1) .Write("FR01") .Write("63333333") .Write("FNP-MERC-4242424-23232-898989/0101-E3 Meow") .Write(35.09) .SkipColumns(1) .Write("V0") .Write("FR01010002_0013") .Write("7226") .BeginRow() .SkipColumns(1) .Write("FR01") .Write("11100") .Write("FNP-MERC-4242424-23232-898989/0101-E3 Meow") .Write(7.02) .SkipColumns(1) .Write("V0") .Write("FR01010002_0013") .Write("7226") .BeginRow() .SkipColumns(1) .Write("FR01") .Write("240205") .Write("FNP-MERC-4242424-23232-898989/0101-E3 Meow") .SkipColumns(1) .Write(42.11) .Write("V0") .Write("FR01010002_0013") .Write("7226") .BeginRow(); } } ``` `Constants.cs`: ```csharp using System.Drawing; using LargeXlsx; namespace ExcelMeh; public static class Constants { public static class Colors { public static readonly Color DarkYellow = Color.FromArgb(255, 192, 0); public static readonly Color Blue = Color.FromArgb(180, 198, 231); public static readonly Color DarkBlue = Color.FromArgb(142, 169, 219); public static readonly Color LightBlue = Color.FromArgb(237, 241, 249); public static readonly Color ForestGreen = Color.FromArgb(169, 208, 142); } public static class Fills { public static readonly XlsxFill DarkYellow = new(Colors.DarkYellow); public static readonly XlsxFill DarkBlue = new(Colors.DarkBlue); public static readonly XlsxFill Blue = new(Colors.Blue); public static readonly XlsxFill LightBlue = new(Colors.LightBlue); public static readonly XlsxFill ForestGreen = new(Colors.ForestGreen); } public static class Alignments { public static readonly XlsxAlignment Left = new(XlsxAlignment.Horizontal.Left); public static readonly XlsxAlignment Center = new(XlsxAlignment.Horizontal.Center); public static readonly XlsxAlignment Right = new(XlsxAlignment.Horizontal.Right); } public static class Fonts { public const string Name = "Calibri"; public const double Size = 10; public const double TitleSize = 13; public const double BoldSize = 11; public static readonly XlsxFont Default = new(Name, Size, Color.Empty); public static readonly XlsxFont Title = Default.WithSize(TitleSize); public static readonly XlsxFont Bold = Default.WithSize(BoldSize).WithBold(); } public static class NumberFormats { public const string ShortDatePattern = "DD/MM/YYYY"; public static readonly XlsxNumberFormat ShortDate = new(ShortDatePattern); } public static class Styles { public static readonly XlsxStyle Default = XlsxStyle.Default.With(Fonts.Default); public static readonly XlsxStyle Title = XlsxStyle.Default.With(Fonts.Title); public static readonly XlsxStyle Bold = XlsxStyle.Default.With(Fonts.Bold); public static readonly XlsxStyle DarkYellowBg = XlsxStyle.Default.With(Fills.DarkYellow); public static readonly XlsxStyle DarkBlueBoldBg = Bold.With(Fills.DarkBlue).With(Alignments.Right); public static readonly XlsxStyle DarkBlueBoldBgLeft = Bold.With(Fills.DarkBlue).With(Alignments.Left); public static readonly XlsxStyle BlueBg = Default.With(Fills.Blue).With(Alignments.Left); public static readonly XlsxStyle LightBlueBg = Default.With(Fills.LightBlue).With(Alignments.Center); public static readonly XlsxStyle LightGreenBg = Default.With(Fills.ForestGreen).With(Alignments.Center); public static readonly XlsxStyle ShortDate = Default.With(NumberFormats.ShortDate); } private const double InToPxRatio = 13; public static readonly IReadOnlyCollection ColumnsMin = new[] { XlsxColumn.Formatted(0.54 * InToPxRatio), XlsxColumn.Formatted(1.39 * InToPxRatio), XlsxColumn.Formatted(1.54 * InToPxRatio), XlsxColumn.Formatted(3.53 * InToPxRatio), XlsxColumn.Formatted(1.12 * InToPxRatio), XlsxColumn.Formatted(1.80 * InToPxRatio), XlsxColumn.Formatted(1.80 * InToPxRatio), XlsxColumn.Formatted(1.42 * InToPxRatio), XlsxColumn.Formatted(1.93 * InToPxRatio) }; public static readonly IReadOnlyCollection Columns = new[] { XlsxColumn.Formatted(0.54 * InToPxRatio), XlsxColumn.Formatted(1.39 * InToPxRatio), XlsxColumn.Formatted(1.85 * InToPxRatio), XlsxColumn.Formatted(3.86 * InToPxRatio), XlsxColumn.Formatted(1.62 * InToPxRatio), XlsxColumn.Formatted(1.39 * InToPxRatio), XlsxColumn.Formatted(2.47 * InToPxRatio), XlsxColumn.Formatted(2.93 * InToPxRatio), XlsxColumn.Formatted(1.31 * InToPxRatio), XlsxColumn.Formatted(1.70 * InToPxRatio), XlsxColumn.Formatted(2.01 * InToPxRatio), XlsxColumn.Formatted(2.39 * InToPxRatio), XlsxColumn.Formatted(2.01 * InToPxRatio), XlsxColumn.Formatted(1.62 * InToPxRatio), XlsxColumn.Formatted(2.47 * InToPxRatio), XlsxColumn.Formatted(1.16 * InToPxRatio), XlsxColumn.Formatted(1.78 * InToPxRatio), XlsxColumn.Formatted(1.78 * InToPxRatio), XlsxColumn.Formatted(1.54 * InToPxRatio), XlsxColumn.Formatted(2.85 * InToPxRatio), XlsxColumn.Formatted(1.08 * InToPxRatio), XlsxColumn.Formatted(1.54 * InToPxRatio), XlsxColumn.Formatted(1.47 * InToPxRatio), XlsxColumn.Formatted(1.62 * InToPxRatio), XlsxColumn.Formatted(1.93 * InToPxRatio), XlsxColumn.Formatted(1.62 * InToPxRatio) }; public static class Values { public const string WorksheetName = "General Journal Entry"; public const string Title = "Upload General Journal Entry"; public const string BatchId = "Batch ID"; public const string Header = "Header"; public const string LineItems = "Line Items"; public const string TransactionCurrency = "Transaction Currency"; public static readonly IReadOnlyCollection Comments = new[] { "// To add field columns to the template, please add technical names.", "// For a complete list of field columns and their technical names, choose ? in the right upper corner of the app screen and then view the Browse entry of web assistance.", "// If you want to add more Profitability Segment (CO-PA) fields, make sure that you add the 'PROF_' prefix in the technical names of the CO-PA fields." }; public static readonly IReadOnlyCollection HeaderColumNamesMin = new[] { "BUKRS", "BLART", "BLDAT", "BUDAT", "BKTXT", "WAERS" }; public static readonly IReadOnlyCollection HeaderColumDescriptionsMin = new[] { "*Company Code (4)", "*Journal Entry Type (2)", "*Journal Entry Date", "*Posting Date", "Document Header Text (25)", "*Transaction Currency (5)" }; public static readonly IReadOnlyCollection LineItemColumnNamesMin = new[] { "BUKRS", "HKONT", "SGTXT", "WRSOL", "WRHAB", "MWSKZ", "PS_POSID", "YY1_SUPPLIER_ACCRUAL_COB" }; public static readonly IReadOnlyCollection LineItemColumnDescriptionsMin = new[] { "Company Code (4)", "G/L Account (10)", "Item Text (50)", "Debit", "Credit", "Tax Code (2)", "WBS Element (24)", "Supplier Accrual (10)" }; } } ``` `XlsxWriterExtensions`: ```csharp using LargeXlsx; namespace ExcelMeh; public static class XlsxWriterExtensions { public static XlsxWriter WriteSharedStrings(this XlsxWriter source, IEnumerable values, XlsxStyle style) { foreach (var value in values) { source.WriteSharedString(value, style); } return source; } public static XlsxWriter BeginRows(this XlsxWriter source, IEnumerable values, XlsxStyle style) { foreach (var value in values) { source.BeginRow().Write(value, style); } return source; } } ``` `Process.cs`: ```csharp using LargeXlsx; namespace ExcelMeh; public static class XlsxWriterExtensions { public static XlsxWriter WriteSharedStrings(this XlsxWriter source, IEnumerable values, XlsxStyle style) { foreach (var value in values) { source.WriteSharedString(value, style); } return source; } ``` `Program.cs`: ```csharp using ExcelMeh; const string path = @"C:\Users\natalie-perret\Desktop\test.xlsx"; AccrualsGenerator.Generate(path); DefaultApp.StartDefault(path); ```

If I'm trying to use the file as-is after they have been generated to push them to SAP, it doesn't work:

image

But if I open them with Libre Office or MS Office and save them, I can then push to SAP:

image

image

File Examples:

Expand / Collapse to see the substantial changes found between the two files / archives (since `*.xslx` are `*.zip` containing `*.xml`'s in disguise) `test - original\xl\styles.xml`: ```xml ``` --- `test - original\xl\worksheets\sheet1.xml`: ```xml Upload General Journal Entry // To add field columns to the template, please add technical names. // For a complete list of field columns and their technical names, choose ? in the right upper corner of the app screen and then view the Browse entry of web assistance. // If you want to add more Profitability Segment (CO-PA) fields, make sure that you add the 'PROF_' prefix in the technical names of the CO-PA fields. Batch ID 1 0 1 2 3 4 5 6 7 8 9 10 11 12 FR01 AD 44927 44927 MERC11-1111444-55500 EUR 13 14 1 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 FR01 63333333 FNP-MERC-4242424-23232-898989/0101-E3 Meow 35.09 V0 FR01010002_0013 7226 FR01 11100 FNP-MERC-4242424-23232-898989/0101-E3 Meow 7.02 V0 FR01010002_0013 7226 FR01 240205 FNP-MERC-4242424-23232-898989/0101-E3 Meow 42.11 V0 FR01010002_0013 7226 ``` `test - saved with editor\xl\styles.xml`: ```xml ``` `test - saved with editor\xl\worksheets\sheet1.xml`: ```xml ```

I was wondering if you had any idea about the difference provided when Libre Office has saved the original file to copy 🤔. I mean there are changes, but it's still "kinda" the same thing but different enough that SAP can accept the file 🤔

Would like to avoid resaving the file via Libre / MS Office to be able to push those files to SAP.

Any idea / hint?

salvois commented 1 year ago

Hi @natalie-o-perret , when writing the XML files composing the XLSX package I have managed to produce the smallest set of files and attributes that made Excel and LibreOffice happy. I may have missed something that the standard mandates for a proper XLSX file, so maybe SAP is really looking for something that is missing or different. Please allow me to compare your versions and try to guess it, but I need your help to make some checks because I have no SAP at disposal. Thanks, Salvo

natalie-o-perret commented 1 year ago

Hi @salvois,

Hi @natalie-o-perret , when writing the XML files composing the XLSX package I have managed to produce the smallest set of files and attributes that made Excel and LibreOffice happy. I may have missed something that the standard mandates for a proper XLSX file, so maybe SAP is really looking for something that is missing or different. Please allow me to compare your versions and try to guess it, but I need your help to make some checks because I have no SAP at disposal. Thanks, Salvo

So glad to hear your feedback! Sure, lemme know if you have some ideas of what might help SAP to see the files as proper Excel files.

Both files are available in my OP:

I also have access to a test/dev instance if you need me to try out some new types of generated files.

salvois commented 1 year ago

Hi @natalie-o-perret , I have made some changes to the XML files generated by LargeXlsx to make them closer to the ones generated by Excel. Are you able to checkout the master branch and include LargeXlsx as a source project in your project to try it? Thanks, Salvo

salvois commented 1 year ago

Hi @natalie-o-perret , I have just released version 1.7.0 which includes the changes I mentioned in my last comment. Pleas let me know whether it helps with your problem. Thanks, Salvo

natalie-o-perret commented 1 year ago

Hi @natalie-o-perret , I have just released version 1.7.0 which includes the changes I mentioned in my last comment. Pleas let me know whether it helps with your problem. Thanks, Salvo

Hi @salvois, sorry to only get back to you """now""".

Long story short, no, it still doesn't address the issue.

Elucubrations I've tried to post multiple times my findings, but every time my (very) long comment with pictures and code snippets is crashing (well, my browser is crashing but it's unrelated to GitHub, other tabs do, though) 😕. And obviously, I didn't think about saving a draft the only two times it happened 😕, kinda frustrating. I'm gonna use this comment as ~~"a live WIP draft"~~ --I'm finally done answering ~~, as I got a wee too tired to waste hours gathering everything just to see it gone to trash. # Note about shared strings and weird ""unescaped"" characters (or deemed as such by LibreOffice / Excel) One of the numerous things I've found is about the un/escaped characters. E.g., When using the `LargeXslx\xl\worksheets\sheet1.xml` ```xml [...] Upload General Journal Entry // To add field columns to the template, please add technical names. // For a complete list of field columns and their technical names, choose ? in the right upper corner of the app screen and then view the Browse entry of web assistance. // If you want to add more Profitability Segment (CO-PA) fields, make sure that you add the 'PROF_' prefix in the technical names of the CO-PA fields. [...] ``` The thing is that these strings are used only once but once the LargeXlsx generated file is saved with LibreOffice, these instances are converted to shared strings and referenced as such. While doing so in the `.\xl\sharedStrings.xml`, I've noticed that **certain** escaped characters (or deemed as such), such as: `'` are being replaced with ""html equivalent codes"", e.g., `&apos`: `LargeXslx then saved LibreOffice\xl\sharedStrings.xml` ```xml [...] // To add field columns to the template, please add technical names. // For a complete list of field columns and their technical names, choose ? in the right upper corner of the app screen and then view the Browse entry of web assistance. // If you want to add more Profitability Segment (CO-PA) fields, make sure that you add the 'PROF_' prefix in the technical names of the CO-PA fields. [...] ``` That being said, fixing this issue, alone, isn't enough to be able to post the file to SAP successfully. Also putting the `'` back in the `LargeXslx then saved LibreOffice\xl\sharedStrings.xml` and posting `LargeXslx then saved LibreOffice.xlsx` to SAP, still work. So I guess there might for shared strings cause `//` (which I'm enforcing this [i.e., putting those strings as shared strings to repro the LibreOffice transformation] in my sample generation script now, so I'm tossing aside that particular issue for the time being). I'm also suspecting this has been done because the strings above are starting with `//` (SAP template for posting journal entries files 🤦‍♀️). # File Generation - Discrepancies | File | LargeXlsx gen is SAP-compliant? | Comment | |--------------------------------|:-------------------------------:|----------------------------| | `.\_rels\.rels` | ✅ | | | `.\xl\_rels\workbook.xml.rels` | ✅ | | | `.\xl\worksheets\sheet1.xml` | ✅ | by tweaking the generation (i.e., forcing using shared strings explicitly | | `.\xl\sharedStrings.xml` | ✅ | by tweaking the generation (i.e., forcing using shared strings explicitly | | `.\xl\styles.xml` | ❌ | date format `DD/MM/YYYY` is botched up if passed all uppercase, managed to make it work by passing `dd/MM/yyyy` and having `DD/MM/YYYY`, otherwise SAP found the file non-compliant 🤦‍♀️ | | `.\xl\workbook.xml` | ✅ | | | `.\[Content_Types].xml` | ✅ | | ## Read-only attributes marking files in LargeXlsx files generated in `*.xlsx` (zip) archive ## Prettified or not? ## `.\xl\styles.xml` `LargeXlsx\xl\styles.xml`: ```xml ``` `LargeXslx then saved LibreOffice\xl\styles.xml`: ```xml ```

EDIT

So, after banging my head against the wall countless times at night and tweaking the generations to leverage systematically shared strings whenever there are string (SAP-journal-entries-excel-file compliance is bonkers), I think I've finally found the issue. Also had to tweak the way the styles are passed.

The new generation (in plain C# tweaked for SAP-compliance):

Expand / Collapse the C# implementation `AccrualsGenerator.cs`: ```csharp using LargeXlsx; using SharpCompress.Compressors.Deflate; namespace ExcelMeh; public static class AccrualsGenerator { public static void Generate(string path) { using var stream = new FileStream(path, FileMode.Create, FileAccess.Write); using var xlsxWriter = new XlsxWriter(stream, CompressionLevel.BestCompression, true); xlsxWriter.SetDefaultStyle(Constants.Styles.Default) .BeginWorksheet(Constants.Values.WorksheetName, columns: Constants.ColumnsMin) .BeginRow() .WriteSharedString(Constants.Values.Title, Constants.Styles.Title) .BeginRows(Constants.Values.Comments, Constants.Styles.Bold) .BeginRow() .WriteSharedString(Constants.Values.BatchId, Constants.Styles.DarkYellowBg) .Write(Constants.Styles.DarkYellowBg) .SkipRows(2) .BeginRow(style: Constants.Styles.DarkBlueBoldBgLeft) .Write(1, Constants.Styles.DarkBlueBoldBg) .WriteSharedString(Constants.Values.Header, Constants.Styles.DarkBlueBoldBg) .BeginRow() .SkipColumns(1).WriteSharedStrings(Constants.Values.HeaderColumNamesMin, Constants.Styles.LightBlueBg) .BeginRow() .SkipColumns(1).WriteSharedStrings(Constants.Values.HeaderColumDescriptionsMin, Constants.Styles.LightBlueBg) .BeginRow() .SkipColumns(1) .WriteSharedString("FR01") .WriteSharedString("AD") .Write(new DateTime(2023, 01, 01), Constants.Styles.ShortDate) .Write(new DateTime(2023, 01, 01), Constants.Styles.ShortDate) .WriteSharedString("MERC11-1111444-55500") .WriteSharedString("EUR") .SkipRows(1) .BeginRow() .SkipColumns(1) .WriteSharedString(Constants.Values.LineItems, Constants.Styles.BlueBg) .BeginRow() .SkipColumns(4) .WriteSharedString(Constants.Values.TransactionCurrency, Constants.Styles.LightBlueBg, 2) .BeginRow() .SkipColumns(1) .WriteSharedStrings(Constants.Values.LineItemColumnNamesMin, Constants.Styles.LightBlueBg) .BeginRow() .SkipColumns(1) .WriteSharedStrings(Constants.Values.LineItemColumnDescriptionsMin, Constants.Styles.LightBlueBg) .BeginRow() .SkipColumns(1) .WriteSharedString("FR01") .WriteSharedString("63333333") .WriteSharedString("FNP-MERC-4242424-23232-898989/0101-E3 Meow") .Write(35.09) .SkipColumns(1) .WriteSharedString("V0") .WriteSharedString("FR01010002_0013") .WriteSharedString("7226") .BeginRow() .SkipColumns(1) .WriteSharedString("FR01") .WriteSharedString("11100") .WriteSharedString("FNP-MERC-4242424-23232-898989/0101-E3 Meow") .Write(7.02) .SkipColumns(1) .WriteSharedString("V0") .WriteSharedString("FR01010002_0013") .WriteSharedString("7226") .BeginRow() .SkipColumns(1) .WriteSharedString("FR01") .WriteSharedString("240205") .WriteSharedString("FNP-MERC-4242424-23232-898989/0101-E3 Meow") .SkipColumns(1) .Write(42.11) .WriteSharedString("V0") .WriteSharedString("FR01010002_0013") .WriteSharedString("7226") .BeginRow(); } } ``` `Constants.cs`: ```csharp using System.Drawing; using LargeXlsx; namespace ExcelMeh; public static class Constants { public static class Colors { public static readonly Color DarkYellow = Color.FromArgb(255, 192, 0); public static readonly Color Blue = Color.FromArgb(180, 198, 231); public static readonly Color DarkBlue = Color.FromArgb(142, 169, 219); public static readonly Color LightBlue = Color.FromArgb(237, 241, 249); public static readonly Color ForestGreen = Color.FromArgb(169, 208, 142); } public static class Fills { public static readonly XlsxFill DarkYellow = new(Colors.DarkYellow); public static readonly XlsxFill DarkBlue = new(Colors.DarkBlue); public static readonly XlsxFill Blue = new(Colors.Blue); public static readonly XlsxFill LightBlue = new(Colors.LightBlue); public static readonly XlsxFill ForestGreen = new(Colors.ForestGreen); } public static class Alignments { public static readonly XlsxAlignment Left = new(XlsxAlignment.Horizontal.Left); public static readonly XlsxAlignment Center = new(XlsxAlignment.Horizontal.Center); public static readonly XlsxAlignment Right = new(XlsxAlignment.Horizontal.Right); } public static class Fonts { public const string Name = "Calibri"; public const double Size = 10; public const double TitleSize = 13; public const double BoldSize = 11; public static readonly XlsxFont Default = new(Name, Size, Color.Empty); public static readonly XlsxFont Title = Default.WithSize(TitleSize); public static readonly XlsxFont Bold = Default.WithSize(BoldSize).WithBold(); } public static class NumberFormats { public const string GeneralPattern = "General"; public static readonly XlsxNumberFormat General = new(GeneralPattern); public const string ShortDatePattern = @"dd/MM/yyyy"; public static readonly XlsxNumberFormat ShortDate = new(ShortDatePattern); } public static class Styles { public static readonly XlsxStyle Default = XlsxStyle.Default.With(Fonts.Default).With(NumberFormats.General); public static readonly XlsxStyle Title = XlsxStyle.Default.With(Fonts.Title); public static readonly XlsxStyle Bold = XlsxStyle.Default.With(Fonts.Bold); public static readonly XlsxStyle DarkYellowBg = XlsxStyle.Default.With(Fills.DarkYellow); public static readonly XlsxStyle DarkBlueBoldBg = Bold.With(Fills.DarkBlue).With(Alignments.Right); public static readonly XlsxStyle DarkBlueBoldBgLeft = Bold.With(Fills.DarkBlue).With(Alignments.Left); public static readonly XlsxStyle BlueBg = Default.With(Fills.Blue).With(Alignments.Left); public static readonly XlsxStyle LightBlueBg = Default.With(Fills.LightBlue).With(Alignments.Center); public static readonly XlsxStyle LightGreenBg = Default.With(Fills.ForestGreen).With(Alignments.Center); public static readonly XlsxStyle ShortDate = Default.With(NumberFormats.ShortDate); } private const double InToPxRatio = 13; public static readonly IReadOnlyCollection ColumnsMin = new[] { XlsxColumn.Formatted(0.54 * InToPxRatio), XlsxColumn.Formatted(1.39 * InToPxRatio), XlsxColumn.Formatted(1.54 * InToPxRatio), XlsxColumn.Formatted(3.53 * InToPxRatio), XlsxColumn.Formatted(1.12 * InToPxRatio), XlsxColumn.Formatted(1.80 * InToPxRatio), XlsxColumn.Formatted(1.80 * InToPxRatio), XlsxColumn.Formatted(1.42 * InToPxRatio), XlsxColumn.Formatted(1.93 * InToPxRatio) }; public static readonly IReadOnlyCollection Columns = new[] { XlsxColumn.Formatted(0.54 * InToPxRatio), XlsxColumn.Formatted(1.39 * InToPxRatio), XlsxColumn.Formatted(1.85 * InToPxRatio), XlsxColumn.Formatted(3.86 * InToPxRatio), XlsxColumn.Formatted(1.62 * InToPxRatio), XlsxColumn.Formatted(1.39 * InToPxRatio), XlsxColumn.Formatted(2.47 * InToPxRatio), XlsxColumn.Formatted(2.93 * InToPxRatio), XlsxColumn.Formatted(1.31 * InToPxRatio), XlsxColumn.Formatted(1.70 * InToPxRatio), XlsxColumn.Formatted(2.01 * InToPxRatio), XlsxColumn.Formatted(2.39 * InToPxRatio), XlsxColumn.Formatted(2.01 * InToPxRatio), XlsxColumn.Formatted(1.62 * InToPxRatio), XlsxColumn.Formatted(2.47 * InToPxRatio), XlsxColumn.Formatted(1.16 * InToPxRatio), XlsxColumn.Formatted(1.78 * InToPxRatio), XlsxColumn.Formatted(1.78 * InToPxRatio), XlsxColumn.Formatted(1.54 * InToPxRatio), XlsxColumn.Formatted(2.85 * InToPxRatio), XlsxColumn.Formatted(1.08 * InToPxRatio), XlsxColumn.Formatted(1.54 * InToPxRatio), XlsxColumn.Formatted(1.47 * InToPxRatio), XlsxColumn.Formatted(1.62 * InToPxRatio), XlsxColumn.Formatted(1.93 * InToPxRatio), XlsxColumn.Formatted(1.62 * InToPxRatio) }; public static class Values { public const string WorksheetName = "General Journal Entry"; public const string Title = "Upload General Journal Entry"; public const string BatchId = "Batch ID"; public const string Header = "Header"; public const string LineItems = "Line Items"; public const string TransactionCurrency = "Transaction Currency"; public static readonly IReadOnlyCollection Comments = new[] { "// To add field columns to the template, please add technical names.", "// For a complete list of field columns and their technical names, choose ? in the right upper corner of the app screen and then view the Browse entry of web assistance.", "// If you want to add more Profitability Segment (CO-PA) fields, make sure that you add the 'PROF_' prefix in the technical names of the CO-PA fields." }; public static readonly IReadOnlyCollection HeaderColumNamesMin = new[] { "BUKRS", "BLART", "BLDAT", "BUDAT", "BKTXT", "WAERS" }; public static readonly IReadOnlyCollection HeaderColumDescriptionsMin = new[] { "*Company Code (4)", "*Journal Entry Type (2)", "*Journal Entry Date", "*Posting Date", "Document Header Text (25)", "*Transaction Currency (5)" }; public static readonly IReadOnlyCollection LineItemColumnNamesMin = new[] { "BUKRS", "HKONT", "SGTXT", "WRSOL", "WRHAB", "MWSKZ", "PS_POSID", "YY1_SUPPLIER_ACCRUAL_COB" }; public static readonly IReadOnlyCollection LineItemColumnDescriptionsMin = new[] { "Company Code (4)", "G/L Account (10)", "Item Text (50)", "Debit", "Credit", "Tax Code (2)", "WBS Element (24)", "Supplier Accrual (10)" }; } } ``` `XlsxWriterExtensions`: ```csharp using LargeXlsx; namespace ExcelMeh; public static class XlsxWriterExtensions { public static XlsxWriter WriteSharedStrings(this XlsxWriter source, IEnumerable values, XlsxStyle style) { foreach (var value in values) { source.WriteSharedString(value, style); } return source; } public static XlsxWriter BeginRows(this XlsxWriter source, IEnumerable values, XlsxStyle style, bool sharedStrings = true) { if (sharedStrings) { foreach (var value in values) { source.BeginRow().WriteSharedString(value, style); } } else { foreach (var value in values) { source.BeginRow().Write(value, style); } } return source; } } ``` `Process.cs`: ```csharp using LargeXlsx; namespace ExcelMeh; public static class XlsxWriterExtensions { public static XlsxWriter WriteSharedStrings(this XlsxWriter source, IEnumerable values, XlsxStyle style) { foreach (var value in values) { source.WriteSharedString(value, style); } return source; } ``` `Program.cs`: ```csharp using ExcelMeh; const string path = @"C:\Users\natalie-perret\Desktop\test.xlsx"; AccrualsGenerator.Generate(path); DefaultApp.StartDefault(path); ```

But there is still an issue, entry / file attributes in the *.xlsx (*.zip) files

With the new implementation, we can get a file like this: test.xlsx

If you open it with say, 7-Zip, you will see: image image image image image

As you can see, the entries / files are marked in the archive with the attributes: -r--------.

If I extract the files of the archive and stuffing them back in the archive (or a new one): test - but sap-compliant.xlsx

You will then see: image image image image image

the entries are now marked with the attribute N.

Which kind hints me that there is something off with the zip dependency used by LargeXlsx, i.e. SharpCompress and the library doesn't come with a lot of options. And it doesn't seem, to my knowledge (correct me if I'm wrong) that we can't set attributes to the archive entries / files.

An alternative that might work would be to lever ICSharpCode.SharpZipLib that has among other things, a ZipEntry type that carries more properties as you can see in the doc: http://icsharpcode.github.io/SharpZipLib/api/ICSharpCode.SharpZipLib.Zip.ZipEntry.html

salvois commented 1 year ago

Hi @natalie-o-perret , you did a really impressive investigation job, thanks! Please allow me to try to summarize them to make sure I'm understanding your findings correctly. You have found a way to make SAP happy about files created by LargeXlsx by just tweaking the way you use the library, that is by enforcing use of shared strings and some specific styles. Moreover you have found that the zip file (the actual XLSX file) produced by LargeXlsx is not compatible with SAP and you need to unzip and zip it again with a third party zip compressor. Is this summary correct? Do you need both of those? I remember having similar issues with very large files on LibreOffice due to the second (the zip one) issue. I see that the author of SharpCompress himself is considering SharpZipLib for his zip backend, so maybe writing a version of LargeXlsx using it is worth a try. Thanks, Salvo

natalie-o-perret commented 1 year ago

Hi @natalie-o-perret , you did a really impressive investigation job, thanks! Please allow me to try to summarize them to make sure I'm understanding your findings correctly. You have found a way to make SAP happy about files created by LargeXlsx by just tweaking the way you use the library, that is by enforcing use of shared strings and some specific styles. Moreover you have found that the zip file (the actual XLSX file) produced by LargeXlsx is not compatible with SAP and you need to unzip and zip it again with a third party zip compressor. Is this summary correct? Do you need both of those? I remember having similar issues with very large files on LibreOffice due to the second (the zip one) issue. I see that the author of SharpCompress himself is considering SharpZipLib for his zip backend, so maybe writing a version of LargeXlsx using it is worth a try. Thanks, Salvo

Hey @salvois, yea that took me quite a (very) long while to figure most of the stuff out. And yea you got the whole thing right ~~

  • I'd say that the style-format-shared-string things might not be needed. Cause I'm not too sure to understand why LibreOffice or Excel tend to always / systematically put stuff into shared strings and, since LargeXlsx is a rather low-level library, it makes sense to keep things separated. I assume, you know maybe if there was no difference between write string and write shared string, in terms of API design, then it would make sense to let the library decides what has to be put into shared and what hasn't, but this isn't the case. So this is not needed, which takes to me the point below.
  • The zip part would really help though, imho. Also like you said SharpZipLib is considered for the zip backend, and you also have experienced a similar issue in the past, so it would be great to get to the very bottom of this.
salvois commented 1 year ago

Hi @natalie-o-perret , I have just pushed a version of LargeXlsx which replaces SharpCompress with ICSharpCode.SharpZipLib to pack the XLSX file. Could you try that version from sources to check whether that helps with your problem? You find it in the feature/sharpziplib branch. As far as I'm concerned, I've noticed that SharpZipLib presents the same problem that SharpCompress used to in a previous version, that is in ZIP64 mode they save zip headers in a way that LibreOffice considers invalid, thus I could not adopt SharpZipLib lightly, but that could be a start. Thanks, Salvo

natalie-o-perret commented 1 year ago

Hi @natalie-o-perret , I have just pushed a version of LargeXlsx which replaces SharpCompress with ICSharpCode.SharpZipLib to pack the XLSX file. Could you try that version from sources to check whether that helps with your problem? You find it in the feature/sharpziplib branch. As far as I'm concerned, I've noticed that SharpZipLib presents the same problem that SharpCompress used to in a previous version, that is in ZIP64 mode they save zip headers in a way that LibreOffice considers invalid, thus I could not adopt SharpZipLib lightly, but that could be a start. Thanks, Salvo

Hey 🙋‍♀️ @salvois, thanks a ton again!🙇‍♀️ I'm gonna try to work something out with your feature/sharpziplib branch, keep you posted 🪖

salvois commented 1 year ago

Hi @natalie-o-perret , a quick note, just in case you didn't notice, that I merged the last changes to the feature/sharpziplib branches.

natalie-o-perret commented 1 year ago

Hi @natalie-o-perret , a quick note, just in case you didn't notice, that I merged the last changes to the feature/sharpziplib branches.

Thanks, I'm not quite yet there, I've played a bit before you've merged master to feature/sharpziplib, but haven't managed to make it worked / SAP-upload-compliant so far.

Basically what I've been doing so far,

Creating an extension method like below:

using ICSharpCode.SharpZipLib.Zip;

namespace LargeXlsx
{
    internal static class ZipOutputStreamExtensions
    {
        public static void PutNextMsDosEntry(this ZipOutputStream source, string name, int fileAttributes, HostSystemID hostSystemId = HostSystemID.Msdos)
        {
            var zipEntry = new ZipEntry(name)
            {
                HostSystem = (int)hostSystemId,
                ExternalFileAttributes = fileAttributes
            };
            source.PutNextEntry(zipEntry);
        }
    }
}

And then used it in res.

  • SharedStringTable.cs with zipOutputStream.PutNextMsDosEntry("xl/sharedStrings.xml", 0);
  • Stylesheet.cs with zipOutputStream.PutNextMsDosEntry("xl/styles.xml", 0);
  • Worksheet.cs with zipOutputStream.PutNextMsDosEntry($"xl/worksheets/sheet{id}.xml", 32);
  • XlsxWriter.cs with
    • _zipOutputStream.PutNextMsDosEntry("[Content_Types].xml", 32);
    • _zipOutputStream.PutNextMsDosEntry("_rels/.rels", 0);
    • _zipOutputStream.PutNextMsDosEntry("xl/workbook.xml", 0);
    • _zipOutputStream.PutNextMsDosEntry("xl/_rels/workbook.xml.rels", 0);
salvois commented 1 year ago

Hi @natalie-o-perret , I'm closing this issue because, for the time being, I'm not able to provide further assistance based on current information. Please feel free to reopen in case you have further details to share and possibly delve into. Thanks, Salvo

natalie-o-perret commented 1 year ago

Hi @salvois,

I'm re-opening this issue, as I was delving once again into these shenanigans lately, turns out the issue (for SAP) is in the construction of the sheet1.xml

Expand / Collapse the `sheet1.xml` that LargeXlsx has generated so far ```xml 0 1 2 3 4 1 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 44988 44988 20 21 22 23 6 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 18 39 40 35.09 41 42 43 18 44 40 7.02 41 42 43 18 45 40 42.11 41 42 43 ```
Expand / Collapse the `sheet1.xml` SAP can accept (after thousands of attempts at editing the one generated above by LargeXlsx) ```xml 0 1 2 3 4 1 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 44988 44988 20 21 22 23 6 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 18 39 40 35.09 41 42 43 18 44 40 7.02 41 42 43 18 45 40 42.11 41 42 43 ```

The differences are essentially two-folded, it boils down to the same thing proper marking of coordinates for both rows and cells.

They both should have the the proper xml attribute for coordinates, e.g.,

  • <row r="19">
  • <c r="B19" s="7" t="s">)

[EDIT] Ended up doing something similar to https://github.com/salvois/LargeXlsx/pull/31

systematically.

salvois commented 1 year ago

Hello @natalie-o-perret , I'm glad you have found your issue. This puzzles me, though, because your original issue report was on release 1.6.3, which did produce row and cell references, while release 1.7.1, which included @MarkPflug optimization to remove unnecessary row and cell references, was released only on 2023-02-26, that is after your initial tests. Would you mind double checking whether 1.7.0 does indeed work for you? Thanks, Salvo

MarkPflug commented 1 year ago

For what it's worth, in ECMA-376 the "r" attribute is defined as use="optional" on both the CT_Row and CT_Cell types. Probably not worth much when things aren't working for you though.

natalie-o-perret commented 1 year ago

Hello @natalie-o-perret , I'm glad you have found your issue. This puzzles me, though, because your original issue report was on release 1.6.3, which did produce row and cell references, while release 1.7.1, which included @MarkPflug optimization to remove unnecessary row and cell references, was released only on 2023-02-26, that is after your initial tests. Would you mind double checking whether 1.7.0 does indeed work for you? Thanks, Salvo

@salvois I think there was some mixed up at the time on my end, I mean when you spend your time comparing and adjusting what bits (e.g., SAP doesn't support inline strings in worksheet xml's, they have to be part of the SharedStrings.xml) makes SAP rejects and doesn't, it's a lot of shots in the dark, countless *.xlxs unzipped and dissected, just because the support can't address the issue nor the business willing to change how they operate 🤷‍♀️. What I can guarantee is that now it works by using the same trick that has been done in https://github.com/salvois/LargeXlsx/pull/31


For what it's worth, in ECMA-376 the "r" attribute is defined as use="optional" on both the CT_Row and CT_Cell types. Probably not worth much when things aren't working for you though.

@MarkPflug Well, my not-so-wild guess is that SAP probably just use some libraries that aren't properly compliant, it's a Saas and closed source anyway.

salvois commented 1 year ago

Hello @natalie-o-perret , I have finally had a chance to merge #31 , so the current master lets you force LargeXlsx write row and cell references even when redundant. That would be included in the next release. Please feel free to give it a try at your convenience. Thanks, Salvo

natalie-o-perret commented 1 year ago

@salvois hey there 🙋‍♀️ , will check this out. Thanks for the hard work 🙇‍♀️

salvois commented 11 months ago

Hi @natalie-o-perret , I have just released release 1.9.0 which allows to include row and cell references even when redundant. Please feel free to check whether that solves your issue. Thanks, Salvo

salvois commented 9 months ago

Hi @natalie-o-perret , I'm closing this issue. Should the problem persist, please reopen it. Thanks, Salvo