Closed burlesona closed 4 years ago
This is a problem in Pandoc itself:
https://github.com/jgm/pandoc/issues/3383
It is fixed in Pandoc 2.0.x. Just upgrade your Pandoc binary and it'll work.
gem 'pandoc-ruby', '2.0.2'
same error raising RuntimeError: pandoc: Cannot read archive from stdin
@agiratech-reddysai - pandoc-ruby is just a wrapper gem around a system call, i.e. it runs the pandoc(1) binary on your system, so what matters is what version of pandoc is installed. Try running pandoc --version
from the command line, and upgrade pandoc, not pandoc-ruby, if necessary.
I'm running pandoc 2.2.1 and pandoc-ruby 2.0.2 and I'm having trouble getting a docx to convert to html as well (although obviously for a different reason).
The command line works. I'm using example_29.docx from the pandoc demos. Running docker-compose exec web pandoc -f docx -t html tmp/example29.docx
yields
<h1 id="synopsis">Synopsis</h1>
<p><code>pandoc</code> [<em>options</em>] [<em>input-file</em>]…</p>
<h1 id="description">Description</h1>
<p>Pandoc is a <a href="https://www.haskell.org">Haskell</a> library for converting from one markup format to another, and a command-line tool that uses this library.</p>
...
From the console however:
irb(main):001:0> PandocRuby
=> PandocRuby
irb(main):002:0> PandocRuby.convert("tmp/example29.docx", from: :docx, to: :html)
Traceback (most recent call last):
1: from (irb):2
RuntimeError (couldn't parse docx file)
Update! Ok, so I got it working, but there's some interesting behaviour.
PandocRuby.convert("tmp/example29.docx", from: :docx, to: :html)
doesn't work, because that string isn't a valid docx file. File.open
will create a ruby file object and isn't valid to feed into the convert
method. But IO::read
works:
PandocRuby.convert(IO::read("tmp/import_processing_example29.docx"), from: :docx, to: :html)
However, this doesn't:
@converter = PandocRuby.new
@converter.convert(IO::read("tmp/import_processing_example29.docx"), from: :docx, to: :html)
ArgumentError: invalid byte sequence in UTF-8
I don't need that to work, but given info in the readme, it seems that it should. Any idea what's happening there?
Sorry for my absence, and I'm glad you were able to find a workaround. As of the latest version, 2.1.0, docx files can be converted by specifying the file path as an array:
PandocRuby.new(['/path/to/example.docx'], from: 'docx', to: 'html').convert
Please let me know if you run into additional issues related to this!
If you pass a docx file into PandocRuby, e.g.:
It errors with
RuntimeError: pandoc: Cannot read archive from stdin