ironfede / openmcdf

Microsoft Compound File .net component - pure C# - netstandard 2.0
Mozilla Public License 2.0
302 stars 73 forks source link

OpenMCDF Extensions NuGet package #1

Closed Emyr42 closed 5 years ago

Emyr42 commented 7 years ago

OpenMCDF is far less useful without the Extensions. No NuGet package for openmcdf.extensions is a barrier to adoption.

ironfede commented 7 years ago

Emyr42, you're right but current development status of extensions is not ready for package distribution in my opinion. A nuget package for extensions will surely be released when code stability and testing will reach a good level.

silkfire commented 7 years ago

Isn't OpenMcdf.Extensions included by default? At least I'm using it with only OpenMcdf downloaded from NuGet.

nemecben commented 6 years ago

Hello, I see three packages on NuGet for OpenMCDF.

I was using the v2.1.3.34730 with good results but I too wanted the document properties functionality of the Extensions. So I tried the -2.NetStandard but it will not install, I get the error that the package has no references to the current .net framework. I was on 4.6 then tried framework 4.0 and 3.0, all had same results. Not sure what I'm doing wrong there. So I tried the OpenMcdf-2 and it installed, but the AsOLEProperties is not available, only the AsIOStream. I downloaded the openmcdf-master codeset to see that there is work on parsing out the property sets, but I'm not smart enough to make use of it.

Federico, do you see a release in the near future that will parse out the properties in the property sets any time soon? If not, that's ok, I'll still use the rest of the library in a wrapper to break out the properties and functions I need. Or I'll try to anyway.

Thank you for this Lib, it is great.

ironfede commented 6 years ago

Thank you nemecben. At the time I'm writing this comment, the latest "official" packages (the ones mantained directly by me) are the following

image

I've started to implement OLE properties but I've never reached a "release" level for this feature so it is only conditional built by setting a #define OLE_PROPERTY in mainform.cs source code (you can check my comments at #11 ).

You can consider it as a starting point for your implementations or pull requests to OpenMcdf project. You can get a source code snapshot and set conditional flag to play with OLE properties... it should be the only change requested to activate functionality.

At the moment I can't tell you a defined release date for OLE properties - that doesn't mean anyhow that feature will not be implemented because I've noticed some interest in this feature so I'll do my best to include in an official release.

Thank you for your interest in OpenMcdf Best Regards

Federico

nemecben commented 6 years ago

Federico, Thank you for the quick response. I'm doing as you suggested by defining OLE_PROPERTY, of course my usage is throwing exceptions right away. Looks like my first step is to add a few more data type classes based on information from https://msdn.microsoft.com/en-us/library/dd942033.aspx and https://msdn.microsoft.com/en-us/library/dd942532.aspx similar but slightly differing information in two places...

I'm more than happy to contribute, my only concern is that I do something wrong. My programming level is elementary at best. But I can follow what you have going with the property type classes in the PropertyFactory.cs and implement more based on the Microsoft documentation. I know I need the VT_UI4 and VT_CLSID (0x0048) to start with. I have never done a pull request or contributed before so I might need to beg your patience. Is there a handy tool that you can use to see differences to the code? I would be more comfortable if you could quickly proof read my changes before the are committed.

Ben

nemecben commented 6 years ago

I have added the following type classes and added the constructor call from the switch statement. The write function is not tested. The read function is performing as expected on the format of files I have.

Also added VT_DATE constructor call to the switch statement. Your library is working well for acquiring the data I need, specifically a GUID in one of three extended property streams (property stream names always start with 0x05).

I'm out for the weekend now, but if I can work on this Monday it will be figuring out a clean way to expose the non-documented and/or application specific property streams so whatever project is using this dll can get the data. Also there's a thing called JSite Stores and how they map the linked or embedded documents. Getting the linked documents is next. I have it done brute force, but I've learned a lot about parsing binary streams while working in your code and would like to use that to make a clean implementation.

Have a great weekend,

Ben

nemecben commented 6 years ago

Hello, I've had a couple days this week do programming/research. Some of the time was spent on the compound file side of things. Honestly I'm just barely treading water in the sea of information. There is so much info on Microsoft's OLE info site I cannot work my way through it all.
I making use of the Dictionary Property Identifier 0x00000000 one of four special cases that currently exist. Looks likely that more special cases will come about in the future. Note the standard PIDs range from 0x00000002 to 0x7FFFFFFF only two above that range are implemented so far, the local PID = 0x80000000 and the Behavior property 0x80000003. OpenMCDF has implemented the CodePage Property ID of 0x00000001.

If I follow the code correctly there is no check for Dictionaries when reading the IDs and Offsets. Then when we get to reading the properties the Dictionary offset is sent and the read see the Dictionary count as a VT_ type which is not helpful. The dictionary isn't really a property, but meta data about the property set (names, to match up to some of the property identifiers). Best I can understand it is as a .net Dictionary container type. For the special case property sets: "\005SummaryInformation", "\005DocumentSummaryInformation", "\005GlobalInfo", etc there is no Dictionary because those are specified.

Now where I get bogged down is modeling this in OO. To get a complete list (or some container) of all the property sets in a file we need to allow the compound document to define it's other property sets. Because enum type must be declared at compile time there is no way to model the unknown PIDs that may come along. So we would have a class that would model the behavior of the property ID and Offset that would have a Dictionary that mapped IDs to the property names based on the Dictionary in the Property stream, unless it's a special-case property set which have specified Dictionaries and are not allowed to have a Dictionary in them per the Microsoft spec, ref . How do we define those dictionaries at compile time so they are only applied (loaded into the dictionary) to the correct property sets at run time?

ironfede commented 6 years ago

I wouldn't use an enumeration in a future implementation (Yes, I know... I've done it yet :-( ) exactly for the problems you have highlighted. I think I'm going to change this to an Abstract Class containing the "standard" PID (CodePage Property for example) and specialize it for specific loaded dictionary with a Factory pattern. This is only a suggestion anyway, I dont know if this could expose more problems than the ones it should resolve...

nemecben commented 6 years ago

I may have the backwards approach to this in that I'm trying to read files that have unknown sets. I'm trying to get and edit property and link data from a CAD file in a silent, light weight service. I need to keep in mind that this library may also be used by some application to save its own compound file. In the latter case the property structures will be determined by the developer and "hard coded" into their application to define how the file is written.

I ran out of work day Friday, but i was trying an idea of moving the FMTID property from PropertySetStream to PropertySet class. I noticed the the FMTID is read and set, but never used, this is what we will compare to known published FMTIDs to determine if the property set is one we know or a new one that hopefully has a Dictionary. Also, have a PropertyIdentifier base class that would have a Dictionary<uint,String>. We would implement a class SummaryInfoPID : PropertyIdentifier that would load it's dictionary from constants in the constructor. Would also be a DocumentSummaryInfoPID : PropertyIdentifier class that would be similar, but have different constants loaded into the Dictionary property. I'm hoping that polymorphism would work and let other developers define property set IDs that inherit from our base class PropertyIdentifier. But I have a very limited understanding of polymorphism. Oh, I'm thinking the base clasd PropertyIdentifier would handle the case of loading the dictionary.

That is my task for Monday. I'll set that up and see how your Structured Storage Explorer test app. (I find similarities to MiTec's Structured Storage Viewer) to see how it handles unknown property sets.

SaajithaKareem commented 6 years ago

I have replaced OLEstorage file with Opencdf in Compound Server file.But there is no implementation for readblob record readnext record in openmcdf.can anyone tell me an option for that? Compound server cannot read the compound files with the Openmcdf.

ironfede commented 5 years ago

Published nuget for extensions. Please open new issues for current release requests or advices. Many Thanks to all contributors, Federico