metanorma / html2doc

Ruby gem that converts an HTML page/document into a Microsoft Word `.doc` file
Other
30 stars 2 forks source link

libreOffice Support #43

Closed nmenag closed 4 years ago

nmenag commented 4 years ago

I am genering a document with the extension doc but when I open with LibreOffice showed it with XML code.

open document:

MIME-Version: 1.0
Content-Type: multipart/related; boundary="----=_NextPart_74bc86ac.4f67.4e1b"

------=_NextPart_74bc86ac.4f67.4e1b
Content-Location: file:///C:/Doc/file1.doc.htm
Content-Type: text/html; charset="utf-8"

<?xml version="1.0"?>
<!DOCTYPE html>
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40" lang="en">
 <head><!--[if gte mso 9]>
<xml>
<w:WordDocument>
<w:View>Print</w:View>
<w:Zoom>100</w:Zoom>
<w:DoNotOptimizeForBrowser/>
</w:WordDocument>
</xml>
<![endif]-->
<meta http-equiv=Content-Type content="text/html; charset=utf-8"/>

is there any solution?

ronaldtse commented 4 years ago

@nmenag unfortunately LibreOffice is unable to read Microsoft's Word MHT format and it is not something we can handle at the moment. Thanks for the interest!

ronaldtse commented 4 years ago

Ping @opoudjis if you'd like to reply further.

opoudjis commented 4 years ago

Sorry, I'm getting the same results, whichever OpenOffice *.doc filter I pick.

https://products.groupdocs.app/conversion/mht-to-doc will convert the output of Html2Doc into something in native .doc format, but the document is mangled: images are not imported, the table of content fields are not rendering correctly, list numbers disappear. You'll get the text out, but too much is lost.

https://convertonlinefree.com/OtherFormatEN.aspx on the other hand gives very accurate results --- except that it switches the document to landscape.

https://bugs.documentfoundation.org/show_bug.cgi?id=77213&redirected_from=fdo and https://bugs.documentfoundation.org/show_bug.cgi?id=83601 confirm that LibreOffice does not support MHT, and that even unoconv, the command line converter that comes with Libre, treats MHT as plaintext.

We have no current plans to convert MHT to DOC or DOCX fully.

opoudjis commented 4 years ago

Sorry, I'm getting the same results, whichever OpenOffice *.doc filter I pick.

https://products.groupdocs.app/conversion/mht-to-doc will convert the output of Html2Doc into something in native .doc format, but the document is mangled: images are not imported, the table of content fields are not rendering correctly, list numbers disappear. You'll get the text out, but too much is lost.

https://convertonlinefree.com/OtherFormatEN.aspx on the other hand gives very accurate results --- except that it switches the document to landscape.

https://bugs.documentfoundation.org/show_bug.cgi?id=77213&redirected_from=fdo reports success using unoconv, the command line file converter that comes with LibreOffice.

We have no current plans to convert MHT to DOC or DOCX fully.