Closed bitscompagnie closed 7 years ago
I'm not aware of a good way (in Python) to convert WMF/EMF images, so Mammoth doesn't have an officially supported way. However, there are some undocumented image converters that use Libreoffice and ImageMagick to do so, which can be used like so:
def compose(f, g):
def composed(*args, **kwargs):
return f(g(*args, **kwargs))
return composed
fileobj = open("document.docx", "rb")
result = mammoth.convert_to_html(
fileobj,
convert_image=compose(
mammoth.images.data_uri,
mammoth.images.libreoffice_wmf_conversion(post_process=mammoth.images.imagemagick_trim),
),
)
If those work for you, I'd suggest copying them the source for them since they're not officially supported, and may be changed or removed without warning.
Do I have to install particular versions of LibreOffice and ImageMagick to get the above code to work? I can test it on Mac or Windows. Your help/suggestion is really appreciated.
Thanks.
I've no idea, it's just something I cobbled together that worked on my own Linux box.
On Thu, 10 Aug 2017 14:47:32 -0700 BITS notifications@github.com wrote:
Do I have to install particular versions of LibreOffice and ImageMagick to get the above code to work? I can test it on Mac or Windows. Your help/suggestion is really appreciated.
Thanks.
I got it working by following:
Installed LibreOffice 4.3.72;
Installed unoconv (universal office conversion utility), which works with the installed LibreOffice <= 4.3.
On the Mac I did: brew install unoconv
. Unoconv complained about not finding a suitable
Then from the terminal I ran: unoconv --listener
.
LibreOffice installation when I used the most current version LibreOffice.
Additionally to make it work with your code, I created a script as outlined here: Create a shell script at /usr/local/bin/soffice with the following content:
#!/bin/bash
# Need to do this because symlink won't work
# It complains about some .plist files
/Applications/LibreOffice.app/Contents/MacOS/soffice "$@"
#Make it executable
sudo chmod +x /usr/local/bin/soffice
I did above because I noticed that you were running libreoffice in headless mode in the /Library/Python/2.7/site-packages/mammoth/images.py
file under libreoffice_wmf_conversion
definition:
….
output_path = os.path.join(temporary_directory, "image.png")
subprocess.check_call([
"libreoffice",
"--headless",
"--convert-to",
"png",
input_path,
"--outdir",
temporary_directory,
])
Once I completed above steps then it started working otherwise it will complain about a missing file. For any other user that might have the same question.
Thanks again.
How to save the converted images to files instead of embedding them in the html? I tried to include the --output-dir option but it did not work like we do when running the mammoth from the command line directly.
You need to define your own image converter that will save images to disk rather than using mammoth.images.data_uri. For instance, you can see how the CLI sets the convert_image argument:
https://github.com/mwilliamson/python-mammoth/blob/500a2aca545c47b9677bd85e55b9b24dc4ec9c7c/mammoth/cli.py#L25
On Fri, 11 Aug 2017 13:45:29 +0000 (UTC) BITS notifications@github.com wrote:
How to save the converted images to files instead of embedding them in the html? I tried to include the --output-dir option but it did not work like we do when running the mammoth from the command line directly.
Thanks,
Here is my final working code with inspiration from issue #10:
for file in sourcedir:
# Filter source documents to exclude temporary word files
if file.endswith('.docx') and not file.startswith('~$'):
sourcedocx = open('sourcedocs/' + file, 'rb')
result = mammoth.convert_to_html(
# This works fine on Mac OS but not windows, need to fix it
sourcedocx, style_map=style_map, convert_image=compose(
# To save images to a directory
mammoth.images.inline(ImageWriter(outdir)),
# Convert emf/wmf with LibreOffice, Unoconv and ImageMagick
mammoth.images.libreoffice_wmf_conversion(post_process=mammoth.images.imagemagick_trim))
)
html = result.value
# Write the result for each file to a new file in the output directory
with codecs.open('outpudir/' + file + '.html', 'w', 'utf-8') as f:
# Write each file to the destination folder
f.write(html)
print('Done writing html files with python-mammoth')
Hello,
How can we convert embedded .x-emf images to png or jpg? Is there any option/setting to output the embedded images to png or jpg instead of .x-emf?
Currently when I convert docx files, I get some images in the output-dir with .x-emf format and would need to convert them to png or jpg during docx conversion process.
Thanks for your help.