Closed surjit closed 9 years ago
The second invocation doesn't work since Mammoth only works on docx file.
For the first invocation: I can't really say why it's not working without knowing anything about the input file, or what, if anything, is written to the output file.
Does it work if you use a Word document with just a paragraph of text?
let me know, your email i will send you docx file
You can use the address on my GitHub profile, hello@zwobble.org. If you could also provide the expected HTML and the actual HTML that is being generated, that would help to make sure I can reproduce what you're seeing.
Thanks for sending over the file. The file seems to generate HTML successfully, could you describe what HTML you were expecting?
Why not working for me ?
i have python 2 installed ? please guide me steps to install it
if possible, pls send me generated html
Why not working for me ?
Are you saying that the output file is empty? Or missing altogether?
not it showing content but some missing like page numbering and other text at footer
On Sat, Mar 28, 2015 at 9:41 PM, Michael Williamson < notifications@github.com> wrote:
Why not working for me ?
Are you saying that the output file is empty? Or missing altogether?
— Reply to this email directly or view it on GitHub https://github.com/mwilliamson/python-mammoth/issues/2#issuecomment-87255936 .
Mammoth is designed to convert semantically marked up documents into sensible HTML, rather than performing a high-fidelity conversion to represent the original document as closely as possible. For instance, in general, preserving page numbering doesn't make sense in an HTML document, nor is it clear how a footer should be handled.
If you have specific suggestions on how things like footers should be handled, then I'd be happy to hear them, although I might not have much time to work on it.
If you are looking to produce HTML that resembles the original as closely as possible, I'd suggest looking for an alternative project since this is a use-case that Mammoth intentionally does not handle. If you just want to display the Word document in a web page, have you considered using Microsoft's online Office document viewer?
Closing since I'm not sure there's anything else I can do to help, but feel free to open issues if you have suggestions on how specific aspects of the conversion should be handled.
mammoth sample-04.docx my.html Unsupported break type: page An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:instrText An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:fldChar An unrecognised element was ignored: w:tblPrEx An unrecognised element was ignored: w:trPr An unrecognised element was ignored: w:tblPrEx An unrecognised element was ignored: w:tblPrEx An unrecognised element was ignored: w:tblPrEx Unrecognised paragraph style: Legal notice (Style ID: Legalnotice) Unrecognised paragraph style: Title (Style ID: Title) Unrecognised paragraph style: Subtitle (Style ID: Subtitle) Unrecognised paragraph style: Title page info (Style ID: Titlepageinfo) Unrecognised paragraph style: Title page info description (Style ID: Titlepageinfodescription) Unrecognised paragraph style: Title page info (Style ID: Titlepageinfo) Unrecognised paragraph style: Title page info description (Style ID: Titlepageinfodescription) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: Title page info (Style ID: Titlepageinfo) Unrecognised paragraph style: Contributor (Style ID: Contributor) Unrecognised paragraph style: Contributor (Style ID: Contributor) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: Title page info (Style ID: Titlepageinfo) Unrecognised paragraph style: Contributor (Style ID: Contributor) Unrecognised paragraph style: Contributor (Style ID: Contributor) Unrecognised paragraph style: Contributor (Style ID: Contributor) Unrecognised paragraph style: Contributor (Style ID: Contributor) Unrecognised paragraph style: Title page info (Style ID: Titlepageinfo) Unrecognised paragraph style: Title page info description (Style ID: Titlepageinfodescription) Unrecognised paragraph style: Title page info (Style ID: Titlepageinfo) Unrecognised paragraph style: Title page info description (Style ID: Titlepageinfodescription) Unrecognised paragraph style: Title page info description (Style ID: Titlepageinfodescription) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: Title page info description (Style ID: Titlepageinfodescription) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: Title page info description (Style ID: Titlepageinfodescription) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: Subtitle (Style ID: Subtitle) Unrecognised paragraph style: toc 1 (Style ID: TOC1) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 2 (Style ID: TOC2) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 1 (Style ID: TOC1) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 2 (Style ID: TOC2) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 2 (Style ID: TOC2) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 2 (Style ID: TOC2) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 2 (Style ID: TOC2) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 2 (Style ID: TOC2) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 2 (Style ID: TOC2) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 2 (Style ID: TOC2) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 2 (Style ID: TOC2) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 1 (Style ID: TOC1) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 2 (Style ID: TOC2) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 1 (Style ID: TOC1) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 1 (Style ID: TOC1) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: toc 1 (Style ID: TOC1) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: Legal notice (Style ID: Legalnotice) Unrecognised run style: Ref term (Style ID: Refterm) Unrecognised paragraph style: Definition Term (Style ID: DefinitionTerm0) Unrecognised paragraph style: Definition (Style ID: Definition) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised paragraph style: List Continue (Style ID: ListContinue) Unrecognised paragraph style: List Bullet 2 (Style ID: ListBullet2) Unrecognised paragraph style: List Continue 2 (Style ID: ListContinue2) Unrecognised run style: Ref term (Style ID: Refterm) Unrecognised paragraph style: Code (Style ID: Code) Unrecognised paragraph style: Code (Style ID: Code) Unrecognised paragraph style: Code (Style ID: Code) Unrecognised paragraph style: Code (Style ID: Code) Unrecognised paragraph style: Code (Style ID: Code) Unrecognised paragraph style: Code (Style ID: Code) Unrecognised paragraph style: Code (Style ID: Code) Unrecognised paragraph style: Code (Style ID: Code) Unrecognised paragraph style: Code (Style ID: Code) Unrecognised paragraph style: Code small (Style ID: Codesmall) Unrecognised paragraph style: Code small (Style ID: Codesmall) Unrecognised paragraph style: Code small (Style ID: Codesmall) Unrecognised paragraph style: Code small (Style ID: Codesmall) Unrecognised paragraph style: Code small (Style ID: Codesmall) Unrecognised paragraph style: Code small (Style ID: Codesmall) Unrecognised paragraph style: Code small (Style ID: Codesmall) Unrecognised paragraph style: Code small (Style ID: Codesmall) Unrecognised paragraph style: Code small (Style ID: Codesmall) Unrecognised paragraph style: Example (Style ID: Example) Unrecognised paragraph style: Example (Style ID: Example) Unrecognised paragraph style: Example small (Style ID: Examplesmall) Unrecognised paragraph style: Example small (Style ID: Examplesmall) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised run style: Element (Style ID: Element) Unrecognised run style: Element (Style ID: Element) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised run style: Attribute (Style ID: Attribute) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised run style: Datatype (Style ID: Datatype) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised run style: Keyword (Style ID: Keyword) Unrecognised run style: Keyword (Style ID: Keyword) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised run style: Variable (Style ID: Variable) Unrecognised paragraph style: Ref (Style ID: Ref) Unrecognised run style: Ref term (Style ID: Refterm) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised run style: Hyperlink (Style ID: Hyperlink) Unrecognised paragraph style: AppendixHeading1 (Style ID: AppendixHeading1) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised paragraph style: List Bullet (Style ID: ListBullet) Unrecognised paragraph style: AppendixHeading1 (Style ID: AppendixHeading1) Unrecognised paragraph style: AppendixHeading1 (Style ID: AppendixHeading1) root@surjit:/home/rahul# mammoth sample-04.doc my.html Traceback (most recent call last): File "/usr/local/bin/mammoth", line 100, in
main()
File "/usr/local/bin/mammoth", line 35, in main
output_format=args.output_format,
File "/usr/local/lib/python2.7/dist-packages/mammoth/init.py", line 17, in convert
return docx.read(fileobj).map(transform_document).bind(lambda document:
File "/usr/local/lib/python2.7/dist-packages/mammoth/docx/init.py", line 24, in read
zip_file = zipfile.ZipFile(fileobj)
File "/usr/lib/python2.7/zipfile.py", line 770, in init
self._RealGetContents()
File "/usr/lib/python2.7/zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file