slhck / compress-pptx

Compress a PPTX file, converting all PNG/TIFF images to lossy JPEGs
Other
22 stars 5 forks source link

Support for EMF #5

Closed ondrejhavlicek closed 1 year ago

ondrejhavlicek commented 2 years ago

Would it be possible to add support for the EMF format please? Often these are the largest files within the PPTX and converted to JPEG they take up ca. 10x less space.

slhck commented 2 years ago

Good idea. Would you be willing to provide a PR for this? Should be a simple fix.

ondrejhavlicek commented 2 years ago

I am afraid not, am quite a newbie :-/

slhck commented 2 years ago

Can you provide a sample file with an EMF image embedded?

slhck commented 2 years ago

I added a version that simply looks for .emf files and converts them, but I am not sure if this is enough, as I don't have a test file. Please test and report back!

slhck commented 2 years ago

Any feedback on the new release?

ondrejhavlicek commented 2 years ago

Sorry for late response. I've updated to the new version, tried on a file with EMF files, but get:

(base) ohavlicek@MARVIN:~/.local/bin$ compress-pptx /mnt/c/Temp/file.pptx -s 50k --skip-transpare
nt-images
Extracting file ...
Traceback (most recent call last):
  File "/home/ohavlicek/.local/bin/compress-pptx", line 11, in <module>
    sys.exit(main())
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/compress_pptx/__main__.py", line 73, in main
    raise e
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/compress_pptx/__main__.py", line 68, in main
    force=cli_args.force,
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/compress_pptx/compress_pptx.py", line 108, in run
    self._find_images()
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/compress_pptx/compress_pptx.py", line 149, in _find_images
    if self.skip_transparent_images and _has_transparency(file, self.verbose):
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/compress_pptx/compress_pptx.py", line 36, in _has_transparency
    stdout, _ = run_command(cmd, verbose=verbose)
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/compress_pptx/util.py", line 66, in run_command
    "error running command {}: ".format(" ".join(cmd)) + stderr.decode("utf-8")
RuntimeError: error running command identify -format %[opaque] /tmp/tmpb8f1ugl8/ppt/media/image83.emf: identify-im6.q16: no decode delegate for this image format `EMF' @ error/constitute.c/ReadImage/504.

Without the skip-transparent-images:

(base) ohavlicek@MARVIN:~/.local/bin$ compress-pptx /mnt/c/Temp/file.pptx -s 50k
Extracting file ...
Compressing 6 file(s) ...
 67%|████████████████████████████████████████████████████████▋                            | 4/6 [00:00<00:00, 38.14it/s]
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.7/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/lib/python3.7/concurrent/futures/process.py", line 198, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/usr/lib/python3.7/concurrent/futures/process.py", line 198, in <listcomp>
    return [fn(*args) for args in chunk]
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/compress_pptx/compress_pptx.py", line 31, in _compress_image
    run_command(cmd, verbose=args["verbose"])
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/compress_pptx/util.py", line 66, in run_command
    "error running command {}: ".format(" ".join(cmd)) + stderr.decode("utf-8")
TypeError: sequence item 10: expected str instance, PosixPath found
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ohavlicek/.local/bin/compress-pptx", line 11, in <module>
    sys.exit(main())
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/compress_pptx/__main__.py", line 73, in main
    raise e
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/compress_pptx/__main__.py", line 68, in main
    force=cli_args.force,
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/compress_pptx/compress_pptx.py", line 111, in run
    self._compress_images()
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/compress_pptx/compress_pptx.py", line 182, in _compress_images
    process_map(_compress_image, self.image_list)
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/tqdm/contrib/concurrent.py", line 130, in process_map
    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/tqdm/contrib/concurrent.py", line 76, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))
  File "/home/ohavlicek/.local/lib/python3.7/site-packages/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/usr/lib/python3.7/concurrent/futures/process.py", line 483, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
TypeError: sequence item 10: expected str instance, PosixPath found

Not sure if it is due to my WSL2.

ondrejhavlicek commented 2 years ago

If I set size parameter higher than the size of the largest EMF file (which is 60k), it works, both with and without skipping tr.images.

(base) ohavlicek@MARVIN:~/.local/bin$ compress-pptx /mnt/c/Temp/file.pptx -s 100k
Extracting file ...
Compressing 4 file(s) ...
100%|█████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 34.40it/s]
Output written to: /mnt/c/Temp/file.pptx

But the produced pptx file is not a valid file, Powerpoint luckily manages to repair it. Does the tool perhaps just replace the images inside the archive but not change their name (suffix) in references in the xml?

slhck commented 2 years ago

Please send me a file to reproduce the problem! Thanks.

slhck commented 2 years ago

PS: It seems like EMF can only be handled by ImageMagick under Windows. So it probably makes sense not to support it all, since I am not sure how to selectively handle it on Windows alone.