transpect / docx2tex

Converts Microsoft Word docx to LaTeX
BSD 2-Clause "Simplified" License
527 stars 48 forks source link

Handling of embedded .emf files #16

Open zopyx opened 8 years ago

zopyx commented 8 years ago

We have DOCX files where the authors often embed Powerpoint files. This case is not handler properly.

! LaTeX Error: Unknown graphics extension: .emf.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                              

l.429 ...16t125157.docx.tmp/word/media/image1.emf}

? x

Ideally .emf files would converted to proper SVGs or PNGs. If this is not possible they should be removed and not carried forward the LaTeX output Perhaps removed image could be replace with a placeholder or a warning message.

gimsieke commented 8 years ago

We don’t handle that yet. I already asked @mkraetke to add an HTML report output for docx2tex that contains the messages that emerge from docx2hub. These types of embeddings should be reported there (and removed or replaced with a dummy, as you suggested).

mkraetke commented 8 years ago

I would consider this as an enhancement as docx2tex is not intended to convert images. I would suggest that an image processing is done outside of docx2tex. An XProc wrapper for ImageMagick or libwmf would be a bad solution since these tools are not capable of handling clippings, borders etc properly. In this sense I would add htmlreports accompanied with Schematron rules in a later release.

zopyx commented 8 years ago

I think C-REX uses Inkscape for the EMF conversions to SVG with PNG as fallback..doing a reasonably good job.

pbpf commented 4 years ago

Convert images using visio

connor: kevin2059@163.com

import os 
import sys
import win32com.client

from os.path import abspath

visio = win32com.client.Dispatch("Visio.InvisibleApp")

folder=abspath(sys.argv[1])

for oldfilename in os.listdir(folder):

    if oldfilename.endswith(".emf"):
        f=abspath(folder+'\\'+oldfilename)
        doc = visio.Documents.Open(f)
        visio.ActivePage.ResizeToFitContents()#Set the border size according to the content
        doc.ExportAsFixedFormat(1, '{}.pdf'.format(f), 0,0,0,0,False,False,False,False)#Remove the black border
        visio.ActiveDocument.Saved=True
        visio.ActiveDocument.Close()
visio.Quit()
gimsieke commented 4 years ago

@pbpf In the environment that transpect typically runs in, there’s no Visio (or any other Microsoft software) installed.

gamboz commented 4 years ago

We have been using unoconv (on debian systems) for some time with good results on .emf and .wmf. E.g.: unoconv -f pdf -o x.pdf x.emf However, IMHO, this is not something that docx2tex should be concerned with.

mkraetke commented 4 years ago

Thank you for this suggestion. I had rather frustrating results with ImageMagick. However, I'm looking for an EMF converter which is based on Java in order to wrap it as XProc extension step so docx2tex does not rely on pre-installed software (besides Java of course).

pbpf commented 4 years ago

inkscape 1.01 works, can we use it?

inkscape tmp.emf -o tmp.pdf

mkraetke commented 4 years ago

Thanks you for the suggestion. Unfortunately, we don't know the install path and it may vary between operating systems. I would rather stick to a Java library which is capable of converting EMF properly. However, I'll have a look how inkscape handles EMF files.