Closed matt-erhart closed 8 years ago
Try setting the convert_image
argument when calling convert_to_html
.
For instance, the CLI passes the below as convert_image
:
convert_image = mammoth.images.inline(ImageWriter(args.output_dir))
class ImageWriter(object):
def __init__(self, output_dir):
self._output_dir = output_dir
self._image_number = 1
def __call__(self, element):
extension = element.content_type.partition("/")[2]
image_filename = "{0}.{1}".format(self._image_number, extension)
with open(os.path.join(self._output_dir, image_filename), "wb") as image_dest:
with element.open() as image_source:
shutil.copyfileobj(image_source, image_dest)
self._image_number += 1
return {"src": image_filename}
Hmmm, where is ImageWriter and how should I incorporate it? Can I import it first? Is there a little code snippet that would demonstrate how to do this?
import mammoth #and maybe from mammoth import ...
convert_image = mammoth.images.inline(ImageWriter(outdir))
result = mammoth.convert_to_html(docx_file,convert_image=convert_image)
I've got it now. I just copy pasted the class. Just for anyone else who might read this, I had to do the following to save the html without errors:
html2write = u''.join(html).encode('utf-8').strip()
with open("output.html", "w") as text_file:
text_file.write(html2write)
Also going to need to add this to the html file:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
Glad you got it working. I think the writing can be simplified by opening the file with an encoding set, allowing it to write unicode strings directly:
import codecs
with codecs.open("output.html", encoding="utf-8", mode="w") as text_file:
text_file.write(html)
The CLI can do it but I don't see the option when called from the library. What do you recommend if there isn't an option?