xwmx / pandoc-ruby

Ruby wrapper for Pandoc
MIT License
340 stars 43 forks source link

Allow first argument to be an array of file paths #13

Closed changecase closed 8 years ago

changecase commented 9 years ago

Rather than being limited to submitting a string or a single file to PandocRuby, this accepts an array of file paths as well. All of the files are read and concatenated into a single document.

Fixes alphabetum/pandoc-ruby#4.

xwmx commented 9 years ago

Thank for doing this!

I started looking into all this the other day prior to this pull request, and came across some things that got me into rethinking all of the option handling. I'll document some of that here.

Executables

Pandoc previously came with a few scripts (markdown2pdf, html2markdown, hsmarkdown) that handled some of the conversions. These have since been removed over the years, and now the only executable is the pandoc one.

As a result, it was possible to remove this logic, which simplifies things. That commit was here:

https://github.com/alphabetum/pandoc-ruby/commit/ebabdf1ddd779989249fff2bd62569be7079d408

This also had the benefit of making it possible to specify an explicit path for the pandoc executable, including making it possible to use the env utility like /usr/bin/env pandoc.

string input (stdin) vs file input (arguments)

Pandoc takes two main types of inputs: a string of content via standard input and file paths via arguments.

The way the PandocRuby gem has been working so far is that the first argument is always expected to be a string that contains the text to be converted or, if allow_file_paths has been set to true, a string that contains a file path. PandocRuby would then check for a valid file path by determining if a file existed at that path and, if so, would read the file.

The first issue with reusing the first argument that way is that string input can't be easily used in the same program with file inputs since we have to guard against external content (eg, user-supplied content) containing a file path.

The second issue is that the PandocRuby library is reading files at a stage that it doesn't need to. Really, it seems like it would make more sense to just pass those paths to the pandoc executable and let it deal with those file paths itself.

What I'm leaning toward now is separating them out to something like this:

PandocRuby.new("# A String", :option1 => :value, :option2)
PandocRuby.new(file: "/path/to/file.md", option1: 'value', :option2)
PandocRuby.new(file: ["/to/file1.html", "/to/file2.html"], option1: 'value', :option2)

Basically, if the first argument to #new or #convert is a string, pass it to pandoc as stdin, and if it's a hash (not keyword argument, since that would leave 1.9 unsupported) with :file as a key then accept either a string containing a path or an array of strings containing paths.

This approach makes it possible to remove the allow_file_paths business and have everything be more explicit. I only have it partially implemented locally, though, so I don't know if there are any problems with my logic yet or if it ends up being more complicated.

This also seems like a good time to make this backward-incompatible change since the change in ebabdf1 is also backward-incompatible and, therefore, will also require a bump of the major version.

Anyway, those are my incomplete thoughts so far. I'm not sure if there's anything I'm missing yet.

changecase commented 9 years ago

Awesome! That sounds like a much better solution, and you're right: now is a great time for it.

In the meantime, I'm working on building out tests for Pandoc's new readers and writers. Hopefully, these will be useful in working out this significant update!