ironfede / openmcdf

Microsoft Compound File .net component - pure C# - netstandard 2.0
Mozilla Public License 2.0
295 stars 72 forks source link

Unrecognized OLEProperties Streams #134

Open farfilli opened 3 weeks ago

farfilli commented 3 weeks ago

Apart SummaryInformation and DocumentSummaryInformation there are more OLEProperties streams; it seems they all have the "|" character at the beginning of its name.

I did open one of those with the usual command and I'm able to read the OLEProperties, however, if I change them and save they are not accessible anymore

here an example:

Dim cfg As CFSConfiguration = CFSConfiguration.SectorRecycle Or CFSConfiguration.EraseFreeSectors
Dim fs = New FileStream("C:\Users\stazione41\Documents\cil2.par", FileMode.Open, FileAccess.ReadWrite)

Using cf As CompoundFile = New CompoundFile(fs, CFSUpdateMode.Update, cfg)

    Dim dsiStream3 As CFStream = cf.RootStorage.GetStream("Rfunnyd1AvtdbfkuIaamtae3Ie")
    Dim co3 As OLEPropertiesContainer = dsiStream3.AsOLEPropertiesContainer

    Dim DocumentNumber = co3.Properties.First(Function(Prop) Prop.PropertyName = "Document Number")
    Dim Revision = co3.Properties.First(Function(Prop) Prop.PropertyName = "Revision")
    Dim ProjectName = co3.Properties.First(Function(Prop) Prop.PropertyName = "Project Name")

    DocumentNumber.Value = "ABC"
    Revision.Value = "123"
    ProjectName.Value = "EDEF"

    co3.Save(dsiStream3)

    cf.Commit()

End Using

The first time you run the code it seems everything ok, but if you run it a second time you will get an error in reading the OLEProperties. Example document.file with those stream in attachment

This is how the OLEProperties stream are shown in another application image

cil2.zip

Numpsy commented 3 weeks ago

I think there is some code about that assumes one of summary information / document summary information on write - e.g. https://github.com/ironfede/openmcdf/blob/a61b6b9a836865781c2588b9b6bf711f30137e17/sources/OpenMcdf.Extensions/OLEProperties/OLEPropertiesContainer.cs#L215 - so I'm not sure how much trying to write other types of streams will work as it stands.

farfilli commented 3 weeks ago

Structured Storage Viewer can read and successfully write those property Stes therefore it is possible. I would edit the code myself but I do not have enough knowledge in C# syntax.

What I would do is save those Property Sets the same as SummaryInformation, I have seen there is a specific ContainerType enum called "AppSpecific" Perhaps those sets can bind there?

Numpsy commented 3 weeks ago

It might need to retain the FMTID0 property of the input stream when reading it, rather than just mapping it into one of those ContainerType values, otherwise it won't be able to convert AppSpecific back into a value to write to the file on save.

Other than that it might depend on if the stream in question has any special rules for the contents - If it uses all the same rules as SummaryInformation then it might just work, otherwise it might need more work.

Numpsy commented 3 weeks ago

It might need to retain the FMTID0 property of the input stream when reading it, rather than just mapping it into one of those ContainerType values, otherwise it won't be able to convert AppSpecific back into a value to write to the file on save.

e.g. store the format id from https://github.com/ironfede/openmcdf/blob/a61b6b9a836865781c2588b9b6bf711f30137e17/sources/OpenMcdf.Extensions/OLEProperties/OLEPropertiesContainer.cs#L90 in the property context or something like that.

farfilli commented 3 weeks ago

AFAIK those property sets follow the same rules as other properties depending on the type of data they store. They are just app-specific property sets, in the attachment you can find a document that explains its usage. Those property sets are in Solid Edge files to store properties about material \ project and extended document properties.

SE_Understanding_and_Troubleshooting_Corrupt_Files_for_Potential_Repair_v2.pdf

What about creating specific ContainerTypes for those GUID? SE Extended Summary Information {CC024FA2-6EB5-11CE-8AA2-08003601E988} SE Material Information {CC024FCA-6EB5-11CE-8AA2-08003601E988} SE Project Information {F0D60B1-A0D8-11CE-8AA2-08003601E988}

Numpsy commented 3 weeks ago

It might need to retain the FMTID0 property of the input stream when reading it, rather than just mapping it into one of those ContainerType values, otherwise it won't be able to convert AppSpecific back into a value to write to the file on save.

Other than that it might depend on if the stream in question has any special rules for the contents - If it uses all the same rules as SummaryInformation then it might just work, otherwise it might need more work.

Debugging through it, looks like there is another issue as well - it only writes a DictionaryProperty when writing a UserDefinedProperty section, and it looks like that 'Rfunnyd1AvtdbfkuIaamtae3Ie' section contains a dictionary property which gets dropped when the section is overwritten.

farfilli commented 3 weeks ago

Probably you know this page already; can it help?

farfilli commented 3 weeks ago

With my low knowledge, I was able to start debugging and testing, at least I have learned something today 👍

farfilli commented 3 weeks ago

@Numpsy I was able to successfully write that Project Information by manually pass the correct GUID in the Save method

due to my low knowledge on C# syntax I simply commented the definition and forced it as this:

                //FMTID0 = this.ContainerType == ContainerType.SummaryInfo ? new Guid(WellKnownFMTID.FMTID_SummaryInformation) : new Guid(WellKnownFMTID.FMTID_DocSummaryInformation),
                FMTID0 = new Guid(WellKnownFMTID.FMTID_ProjectInformation),

Also note that the ProjectInformation GUID is this SE Project Information {F0D6D0B1-A0D8-11CE-8AA2-08003601E988} there were a missing D in my previous GUID list

farfilli commented 3 weeks ago

I noticed that in this way the Properties lose their name, is that the dictionary you were mentioning?

Numpsy commented 3 weeks ago

Something like #135 might work to retain the original FMTID0 value of AppSpecific streams on write?

I noticed that in this way the Properties lose their name, is that the dictionary you were mentioning?

That's the one - the main property storage stores properties with numerical identifiers and the dictionary entry contains the display names for those properties.

Doing something like https://github.com/Numpsy/openmcdf/commit/0c1d3530fd034179b68bfa8c1ae5c67a3e1e2dda might work to write those values back on save, needs a review of what the exact logic should be though.

farfilli commented 3 weeks ago

@Numpsy It seems to work greatly! I will test on more AppSpecific PropertySets. I did enjoy reading your code and learning to debug, maybe one day I will be able to contribute myself. Thanks!

farfilli commented 3 weeks ago

So, testing with other PropertySets that contain VT_R8 Types values the write method throws an exception "Unable to cast object of type 'System.Int32' to type 'System.Double'

The stream on my file is this "\u0005K4teagxwOttdbfkuIaamtae3Ie"

Writing VT_LPWSTR on that set works fine so your changes #135 and @Numpsy 0c1d353 working greatly.

EDIT: I was passing the number as integer instead of double, declaring it as double before passing it works well

Numpsy commented 3 weeks ago

I've done a test update in #136 as I realised that VT_R8 properties are allowed in the UserDefined properties set, but the unit test for them didn't include one.

EDIT: I was passing the number as integer instead of double, declaring it as double before passing it works well

There is perhaps some scope here to do more up front validation on write for that sort of thing.