Closed codethug closed 2 months ago
Hey, thanks for the feedback. I think the change to the logic sounds fine, my only concern is that having one or two lists of string to include/exclude in the constructor might make them a little complex.
I've looked at a sample macro-enabled Word document and the macro file is named the same (vbaProject.bin) but just sits in a different location. Perhaps we could use this to define a simpler constructor:
protected OfficeOpenXml(string identifiableEntry, bool macroEnabled, string mediaType, string extension)
which then allows for something like:
public Excel() : base("xl/workbook.xml", macroEnabled: false, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", "xlsx")
public MacroEnabledExcel() : base("xl/workbook.xml", macroEnabled: true, "vnd.ms-excel.sheet.macroEnabled.12", "xlsm")
The macroEnabled
flag would then either include/exclude vbaProject.bin (at any path) within the IsMatch
function as you described.
It's a less flexible approach but makes the constructor easier to read. What do you think?
Regarding the list of mime types, the definitive list is here: https://www.iana.org/assignments/media-types/media-types.xhtml
There are a few different registrations for Excel, this appears to be the correct one for xlsm files: https://www.iana.org/assignments/media-types/application/vnd.ms-excel.sheet.macroEnabled.12
@neilharvey Your suggestion feels pretty clean to me, especially since the base class isn't really being exposed (so consumers aren't really going to know or care about the new Boolean argument in the base constructor).
I put together an implementation to support macros in Powerpoint, Excel and Word. Those were the only 3 in IANA's list of MIME types that mentioned macros. How does this look?
Looks good, thanks for the PR! I'll publish a new release with your changes.
Thanks for merging the PR.
Do you need anything else from me to publish a release on NuGet?
Or when should I expect to see this update in a NuGet release?
Apologies, just been busy the last couple of days. You don't need to do anything else, I'll try to get it released this evening.
I've published a version 5.1 to Nuget, should be listed shortly. Give me a shout if there are any issues.
When I send an XLSX file and and XLSM file to be analyzed, they are both identified as XLSX, because they both are a zip file with a file at xl/workbook.xml.
The difference between them is that the XLSM file (excel file with Macros) has a file at xl/vbaProject.bin, but XLSX files do not have a file there.
I'm thinking we can update the rules to be something like this:
XLSX file:
XLSM file:
To accomplish this, some new OpenOfficeXml constructors would need to be added to allow for more inputs, and the IsMatch function would need to updated to use the new inputs.
What do you think? I'd be glad to put together a PR if you're interested.
More notes Similar rules would apply to Word docs with macros, etc. https://codingislove.com/list-mime-types-2016/ indicates that XLSM files have a different content-type. I'm not sure if there is an authoritative source for that kind of information.