ironfede / openmcdf

Microsoft Compound File .net component - pure C# - netstandard 2.0
Mozilla Public License 2.0
302 stars 73 forks source link

all streams in MSI files have chinese file names #11

Closed springy76 closed 6 years ago

springy76 commented 6 years ago

Is this a known problem? Did I do something wrong?

Looks like interpreting ASCII as 16bit Unicode.

ironfede commented 6 years ago

MS-CFB specifications assert that stream names are UTF-16 strings. I think that MSI format encode some type of information in stream names so that they can be used by an api or by Microsoft Installer framework.

springy76 commented 6 years ago

As you already closed this ticked: May I ask what that means to me? Can I already do something to get it right? Maybe I overlooked a constructor overload or something?

ironfede commented 6 years ago

Springy76, what do you mean with "right"? How do you expect msi streams to be named? Please, give some more detail in order to understand your issue.

springy76 commented 6 years ago

For example take this MSI: https://stylecop.codeplex.com/releases/view/629688

Using this code in LINQPad:

new OpenMcdf.CompoundFile(@"D:\Downloads\StyleCop-4.7.55.0.msi", OpenMcdf.CFSUpdateMode.ReadOnly, OpenMcdf.CFSConfiguration.Default)
    .RootStorage
    .VisitEntries(item => item.Name.Dump(), true);

I get the following result which definitely is NOT valid:

䡀䆒䑲
䡀䌏䈯
䡀㲞䈝䗻
䡀䈖䌧䠤
䡀䌋䄱䜵
䡀䌍䏤䊲
䡀䕎䒵䠵
䡀㬿䏲䐸䖱
䡀㽿䅤䈯䠶
䡀䈏䗤䕸䠨
䡀䈛䌪䗶䜵
䡀䋌䆨㫮䛲
䡀䒌䗱䒵䠯
䡀䓞䕪䇤䠨
䡀䕙䓲䕨䜷
䆒䑲䀾䛬䆒䑲
䌠㼻䗨䓸䆾䅤
䡀䆊䌷䑲䈝䗻
䡀䈛㵪䆲䗤䕲
䡀䈛䒰䈹䌏䈯
䡀䈝䗻䗜䏼䠨
䡀䌍䈵䗦䕲䠼
䡀䒌䇱䗬䒬䠱
䡀䒌䓰䑲䑨䠷
䡀䓊㼳䄨䆵䠫
䌋䄱䜵㷾䗨䛏㪌
䡀㼿䕷䑬㭪䗤䠤
䡀㼿䕷䑬㹪䒲䠯
䡀㿿䏤䇬䗤䒬䠱
䡀䄛䌧㫲䗸䒷䠱
䡀䒌䗱䒵㮯䈹䗱
䡀䖖㯬䏬㱨䖤䠫
䡀䘌䗶䐲䆊䌷䑲
䡀䙎䑨㶷䓤䌳䊱
䌋䄱䜵䀾䛬㲞䌠䆻䠤
䡀䄕䑸䋦䒌䇱䗬䒬䠱
䡀䇊䌰㾱㼒䔨䈸䆱䠨
䡀䈗㯷㷻䗤䙬㲨䄰䈪
䡀䒌䗱䒵㬯䑲䌧䌷䑲
䌋䄱䜵䀾䛬㲞㫿䓰㷿䚨
䌋䄱䜵䀾䛬㲞㫿䓰㾿䠳
䡀䈏䗤䕸㬨䐲䒳䈱䗱䠶
䡀䑒䗶䏤㾯㼒䔨䈸䆱䠨
䌋䄱䜵䀾䛬㲞㲿䒦㲿䉱䠲
䡀䇊䌰㮱䈻䘦䈷䈜䘴䑨䈦
䡀䇊䗹䛎䆨䗸㼨䔨䈸䆱䠨
䌋䄱䜵䀾䛬㲞㫿䓰㫿䑤䈱䠵
䌋䄱䜵䀾䛬㲞㫿䓰㭿䄬䒯䠪
䌋䄱䜵䀾䛬㲞㲿䒦㮿䆻䄯䠰
䡀䑒䗶䏤㮯䈻䘦䈷䈜䘴䑨䈦
䌋䄱䜵䀾䛬䘌䗶䐲䆊䌷䑲䇾䏯
SummaryInformation

The last entry "SummaryInformation" starts with an unprintable character.

I hope this helps.

BTW: The streams itself seem to be ok: If they are images they are decodable.

springy76 commented 6 years ago

Addendum: This is strange: 7zip only reports 33 entries (quite the half) and none if them contains the name "summary".

ironfede commented 6 years ago

Take a look at this blog for the msi file format http://robmensching.com/blog/posts/2004/2/10/inside-the-msi-file-format-again/ The author explains that stream names are in a compressed, undocumented format. The first character of the SummaryInformation stream is an unprintable character by design (see a previous issue on this point).

springy76 commented 6 years ago

Ok - but how do I recognize such files after having opened them? I only see a version property which always is 3. Maybe you could expose Header.MinorVersion and Header.ClsId (byte[16]) as GUID in CompoundFile if they contain valuable infos?

ironfede commented 6 years ago

Ok, here we're going experimental... :-) Minor and Major version cannot be changed to other meaning except for a specifications upgrade by Microsoft or another implementors of Compound File Storage format. You could recognize an msi file from its content reading SummaryInformation Stream and decoding OLE properties contained there.

OpenMcdf doesn't expose -by default- this feature because it is in an early phase of development.

Anyway, you can test it using a #define OLE_PROPERTY in MainForm source file of StructuredSorageExplorer and rebuilding.

If you open an msi file after this operation and select the SummaryInformation stream you should see something similar to the following:

image

You can notice that there's the information that the file is an MSI installer.

Please, feel free to experiment but take in account that this is an EXPERIMENTAL feature. Best Regards, Federico

bazarniy commented 4 years ago

Looks like correct parsing names https://stackoverflow.com/questions/9734978/view-msi-strings-in-binary https://github.com/SheetJS/js-cfb/issues/3#issuecomment-502395608