ironfede / openmcdf

Microsoft Compound File .net component - pure C# - netstandard 2.0
Mozilla Public License 2.0
302 stars 73 forks source link

Unrecognized property type #32

Closed OutOfThisPlanet closed 5 years ago

OutOfThisPlanet commented 5 years ago

Hi Ironfede,

We have been using your library in a project at work where I am grabbing the "Creation Date" property from old format office files that contain it (doc, xls, ppt, for example - pub and vsd files don't seem to have the property).

We found that it works really well, but some files throw an "Unrecognized property type" exception. We can't figure out why.

When we check the files with "SSView", we can see the date exists. Similarly, the date exists within the Windows File Explorer properties.

Our test code is simple:

using System;
using OpenMcdf;
using OpenMcdf.Extensions;
using OpenMcdf.Extensions.OLEProperties;
using OpenMcdf.Extensions.OLEProperties.Interfaces;

namespace ConsoleApp4
{
    class Program
    {
        static void Main(string[] args)
        {
            string file = @"c:\temp\_Test2.doc";

            CompoundFile cf;

            try
            {
                cf = new CompoundFile(file);
            }
            catch (CFFileFormatException cfe)
            {
                Console.WriteLine(file + " isn't a valid OLE Storage file. " + cfe.Message);
                Console.ReadKey();
                return;
            }

            int numDirectories = cf.GetNumDirectories();

            for (int i = 0; i < numDirectories; i++)
            {
                Console.WriteLine(cf.GetNameDirEntry(i));
            }

            CFStream stream = cf.RootStorage.GetStream("\u0005SummaryInformation");

            PropertySetStream ps = CFStreamExtension.AsOLEProperties(stream);

            int count = 0;

            foreach (PropertyIdentifierAndOffset propId in ps.PropertySet0.PropertyIdentifierAndOffsets)
            {
                Console.WriteLine(count + ": " + propId.PropertyIdentifier.GetDescription());
                count++;
            };

            count = 0;

            foreach(ITypedPropertyValue prop in ps.PropertySet0.Properties)
            {
                Console.WriteLine(count + ": " + prop.GetType() + ": " + prop.PropertyValue);
                count++;
            }
            Console.ReadKey();
        }
    }
}

The issue is thrown on the following line:

PropertySetStream ps = CFStreamExtension.AsOLEProperties(stream);

We can't figure out whether the issue is with the file, our code, or the library...

In 1 case, a file that was not working suddenly started working after we changed some document properties (removed author). Weird. Attempting to do this on another file had no effect.

Attempting to find the original age of a document will help us with our document retention automation in SharePoint. This isn't our SharePoint code, it's just a console app to test pulling back the values from the stream. With GDPR being a thing nowadays, handling all these old files is suddenly also a thing too.

Can you help please?

Thanks for your excellent work!

ironfede commented 5 years ago

I've added some enhancement in OLE Properties handling that hopefully should allow DocumentSummaryInfo and SummaryInfo sets parsing. Please, consider OLE properties still in a beta stage because not all property types are supported and this feature needs a deep unit testing to be considered really a production-ready feature nevertheless it's a useful extension so please let me know if there are other issues possibly attaching an example file to analyze. Best Regards, Federico

ironfede commented 5 years ago

Extension NuGet package 2.2.1.3 published.

nemecben commented 5 years ago

I just wanted to say thank you for sharing your work with us all.

My usage is a little indexing program at my work that monitors our Solid Edge (CAD) files and and stores file links in an SQL table for fast searching. Your code allows the indexing service to search model data files for links without needing to have the CAD software installed on the server. I'm excited to try out this improvement as these files have the document properties you mentioned and sounds like this would help the process of indexing those also.

Ben Nemec

On Wed, Nov 21, 2018, 16:46 Federico Blaseotto <notifications@github.com wrote:

I've added some enhancement in OLE Properties handling that hopefully should allow DocumentSummaryInfo and SummaryInfo sets parsing. Please, consider OLE properties still in a beta stage because not all property types are supported and this feature needs a deep unit testing to be considered really a production-ready feature nevertheless it's a useful extension so please let me know if there are other issues possibly attaching an example file to analyze. Best Regards, Federico

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ironfede/openmcdf/issues/32#issuecomment-440836131, or mute the thread https://github.com/notifications/unsubscribe-auth/AX_v9v6jEMevOG6mVAaY-HTLRHaOly1Nks5uxdfRgaJpZM4Ydh4O .

OutOfThisPlanet commented 5 years ago

Thank you so much Federico, I'll go and get the new package now and test :)

Much appreciated!

OutOfThisPlanet commented 5 years ago

Hi Federico,

Unfortunately, this did not fix our issue.

I have attached a .ppt file (in a zip) that contains a date, but cannot seemingly be grabbed by OpenMCDF.

Some properties seem to have changed in this new version:

"PropertyValue" is now "Value", for example.

For some reason, I am now getting this error message.

System.MissingMethodException: 'Method not found: 'UInt32 OpenMcdf.Extensions.OLEProperties.PropertyIdentifierAndOffset.get_PropertyIdentifier()'.'

I'll "Repair" my Visual Studio.

image

image

_Test.zip

ironfede commented 5 years ago

Thank you for this test-case. I've found the issue (missing clipboarddata property type) and it will be fixed as soon as possible. Please, take in account that api is not stable yet so expect some required changes due to refactoring.

OutOfThisPlanet commented 5 years ago

Hi Federico,

Thanks for looking at it again. I appreciate your efforts very much.

ironfede commented 5 years ago

Please, take a look at current codebase (no nuget yet) to see if this partial commit fix reported issue. API is being refactored (OLE properties sub-project at https://github.com/ironfede/openmcdf/projects/1#card-15194412 ) so some client code change could probably be required.

Numpsy commented 5 years ago

Hi, Not sure whether to put this here or in a new issue, but i get the 'unrecognized property type' exception when trying to read a SummaryInformation stream from a Word document which contains standard properties (e.g. Author) whose type is VT_LPWSTR rather than VT_LPSTR.

I can attach a sample file if that would be useful?

Thanks.

ironfede commented 5 years ago

Hi, please @Numpsy , attach file because I'll use those samples in unit test if it's ok for you. I'm progressively adding property types and I'm trying to cover all MS-OLEPS specifications. Milestone is 2.3.0.0 for OpenMcdf extensions.

Numpsy commented 5 years ago

This file has Author and Keywords properties of type LPWSTR.

wstr_presets.zip

ironfede commented 5 years ago

LPWSTR support added. Work in progress...

ironfede commented 5 years ago

Please, take in account that OLEProperties Container still does not support write methods (NotImpementedException to avoid issues)

ironfede commented 5 years ago

@Numpsy please, let me know if current code base close this issue. Thank you!

Numpsy commented 5 years ago

Hi,

I gave it a quick try and I can get the property values now (no exception any more), but it looks like there might be spurious null characters on the ends of the strings?

extra_null

looks good otherwise though.

ironfede commented 5 years ago

Thanks @Numpsy. Yes, ole strings have null termination AND a size field so I think that it's better if client application applies a post filter to handle them in its preferred way at the moment. I will introduce some type of configuration parameter to specify how handle null characters.

OutOfThisPlanet commented 5 years ago

Seems like this is a thread hijack to me!

Can I ask why this bug report has been closed?

ironfede commented 5 years ago

Sorry, but i thought that last report means "ok". If you think it's not i will reopen.

Il giorno ven 7 dic 2018, 15:39 nullldata notifications@github.com ha scritto:

Seems like this is a thread hijack to me!

Can I ask why this bug report has been closed?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ironfede/openmcdf/issues/32#issuecomment-445252175, or mute the thread https://github.com/notifications/unsubscribe-auth/ARhAuDsFdGW-PT10B6JLT4Y1Vo2-u9zZks5u2n2zgaJpZM4Ydh4O .

OutOfThisPlanet commented 5 years ago

We are currently trying to compile from source, however it's not yet compiling. Looking into it. Previously, we used Nuget to add the extension.

From our perspective, we don't have a working solution currently.

Will update when we successfully compile and test.

Sorry for slow reply, I've been away.

OutOfThisPlanet commented 5 years ago

WooooHoooo! It works! :)

Fede, you are a hero! :)

Please let us know if this gets made into a nuget package, as I fear that using our compiled DLLs may not be updateable.

ironfede commented 5 years ago

@nullldata , i'm going to close issue. Please let me know if it's ok. Nuget package will be released when OLE properties read/write project will be closed as "Production ready". So I think that it will take some time to reach 2.3.0.0 milestone... stay tuned ;-) Thank you for your reports and for your patience.

OutOfThisPlanet commented 5 years ago

@ironfede Sure, no problem.

Thanks again! :)