openpreserve / odf-validator

Open source Open Document Format (ODF) validation
http://odf.openpreservation.org/
BSD 3-Clause "New" or "Revised" License
3 stars 0 forks source link

ODF_6: Embedded Objects #94

Closed carlwilson closed 7 months ago

carlwilson commented 8 months ago

If there are any embedded files then they MUST be one of the acceptable formats.

The list of permitted file formats is:

This means that embedded files must be either one of the above types by nature of itself, be converted to one of the above types or be removed. Bek 128 gives rules for how this information should be conveyed in an AVID collection.

These rules are valid even if there are multiple layers of embedding.

An ODS validator must be able to report the following information:

dewhattens commented 8 months ago

When extracting embedded objects from the zip file, they can take one of two forms -- a file-based Object or a directory-based Object.

The directory-based objects occur most frequently when an OpenOffice/LibreOffice document is embedded within another OpenOffice/LibreOffice document. If you look at the directory-based object's sub-directory, it will contain the same types of files you would find in a standard document such as content.xml, settings.xml, styles.xml, etc. Unfortunately you can't zip up the directory-based object and give it an odp, odt, or ods extension because the directory-based object is missing it's own manifest.xml file. It's possible to take the manifest.xml file in the parent document, modify it, then combine it with the files in the directory-based object but it can get difficult depending on the complexity of the parent file and embedded object.

The file-based objects occur most frequently when the embedded object was inserted by Microsoft Office and/or is a file type that cannot be expressed using the OpenDocument Format. In these cases the native file is wrapped in a Microsoft OLE stream

carlwilson commented 7 months ago

Closed by #135