kienerj / pycdxml

Tools to automatically convert and proccess cdx and cdxml files in python
GNU General Public License v3.0
35 stars 5 forks source link

Add support for compressed embedded image files #4

Closed kienerj closed 2 years ago

kienerj commented 2 years ago

ChemDraw can have embedded images either as plain binary values (hex-strings) or as gziped and base64 encoded string:

kCDXProp_Compressed_MacPICT, // 0x0A66 A Macintosh PICT data object. (GZIP compressed, BASE64 encoded)

kCDXProp_Compressed_WindowsMetafile, // 0x0A67 A Microsoft Windows Metafile object. (GZIP compressed, BASE64 encoded)

kCDXProp_Compressed_OLEObject, // 0x0A68 An OLE object. (GZIP compressed, BASE64 encoded)

kCDXProp_Compressed_EnhancedMetafile, // 0x0A69 A Microsoft Windows Enhanced Metafile object. (GZIP compressed, BASE64 encoded)

These are not documented on the old specification page but can be found in a less old header file somewhere in the old cambridgesoft forums.

In cdxml, the property is part of an 'embeddedobject' element:

image

The format used for the base64 strings seems to be fixed 72 line width and a single empty line after each line including the last line.

The 'embeddedobject' then has an additional attribute containing the size of the original document:

image

The size is an additional property for each of the compressed formats:

image

All of the compressed image formats need to be implemented. This can likley be done by a single new type.

As how this is represented in cdx, some testing is still needed. Especially if the same format for the base64 string is used.

kienerj commented 2 years ago

With trial and error it was determined that in cdx the data is just the gzipped data. the base64 encoding (not surprisingly) is only used for cdxml. Therefore conversion between the formats must encode or decode to/from base64.

kienerj commented 2 years ago

fixed by 925dfc98abc09deff1478621fba9b432e23276a6