xwmx / pandoc-ruby

Ruby wrapper for Pandoc
MIT License
340 stars 43 forks source link

How do you open a docx file? #27

Closed burlesona closed 4 years ago

burlesona commented 7 years ago

If you pass a docx file into PandocRuby, e.g.:

PandocRuby.convert('./file.docx', :S, from: :docx, to: :html)

It errors with RuntimeError: pandoc: Cannot read archive from stdin

abartov commented 6 years ago

This is a problem in Pandoc itself:

https://github.com/jgm/pandoc/issues/3383

It is fixed in Pandoc 2.0.x. Just upgrade your Pandoc binary and it'll work.

agiratech-reddysai commented 6 years ago

gem 'pandoc-ruby', '2.0.2' same error raising RuntimeError: pandoc: Cannot read archive from stdin

abartov commented 6 years ago

@agiratech-reddysai - pandoc-ruby is just a wrapper gem around a system call, i.e. it runs the pandoc(1) binary on your system, so what matters is what version of pandoc is installed. Try running pandoc --version from the command line, and upgrade pandoc, not pandoc-ruby, if necessary.

archonic commented 6 years ago

I'm running pandoc 2.2.1 and pandoc-ruby 2.0.2 and I'm having trouble getting a docx to convert to html as well (although obviously for a different reason).

The command line works. I'm using example_29.docx from the pandoc demos. Running docker-compose exec web pandoc -f docx -t html tmp/example29.docx yields

<h1 id="synopsis">Synopsis</h1>
<p><code>pandoc</code> [<em>options</em>] [<em>input-file</em>]…</p>
<h1 id="description">Description</h1>
<p>Pandoc is a <a href="https://www.haskell.org">Haskell</a> library for converting from one markup format to another, and a command-line tool that uses this library.</p>
...

From the console however:

irb(main):001:0> PandocRuby
=> PandocRuby
irb(main):002:0> PandocRuby.convert("tmp/example29.docx", from: :docx, to: :html)
Traceback (most recent call last):
        1: from (irb):2
RuntimeError (couldn't parse docx file)

Update! Ok, so I got it working, but there's some interesting behaviour.

PandocRuby.convert("tmp/example29.docx", from: :docx, to: :html) doesn't work, because that string isn't a valid docx file. File.open will create a ruby file object and isn't valid to feed into the convert method. But IO::read works:

PandocRuby.convert(IO::read("tmp/import_processing_example29.docx"), from: :docx, to: :html)

However, this doesn't:

@converter = PandocRuby.new
@converter.convert(IO::read("tmp/import_processing_example29.docx"), from: :docx, to: :html)
ArgumentError: invalid byte sequence in UTF-8

I don't need that to work, but given info in the readme, it seems that it should. Any idea what's happening there?

xwmx commented 4 years ago

Sorry for my absence, and I'm glad you were able to find a workaround. As of the latest version, 2.1.0, docx files can be converted by specifying the file path as an array:

PandocRuby.new(['/path/to/example.docx'], from: 'docx', to: 'html').convert

Please let me know if you run into additional issues related to this!