tcocca / active_pdftk

ruby wrapper for the pdftk command line utility for working with editable pdf files
MIT License
46 stars 37 forks source link

Input & Output concerns for the Form adapter #16

Open elmatou opened 13 years ago

elmatou commented 13 years ago

Hi Tom, I will write down all my concerns for the output issue we found yesterday.

First of all, we need to agree on process. I think a common task should look like that

@form = ActivePdftk.Form.new(a_template)
@form.name = 'Marco'
...
@form.save!  #=> save to current template, and return true
@form.save  #=> the template is not altered,  return a StringIO which can be saved (Tempfile or File make sens only if you need to store the result). 

Input discrimination

If the template is a large file present on the system (eg : PDFReference16.pdf ~ 9Mb) the user would prefer to give the path to the file, in order save memory. In this case it would be impossible to output to this same file (pdftk require input & output to be differents files). the strategy could be to inactivate Form#save! or to route output to @template+'_filled.pdf' (here @template is a String reprenseting the path), Form#save! should return the saved path Form#save should'not return a StringIO (if not explicitly required by the :output hash), but what sould it return ?

If the template is an IO stream (non saved file, or light one), Form#save! should save to the current @template altering the stream, and return true. Form#save should return a StringIO of the saved file, input stream remains unaltered.

Same input and output

the Call class should have a new error InputSimilarToOutput raised if input & output are equal String

Input & Output concept extension

Maybe we should write a 'guideline' for Input and Output to be respected in all ActivePdftk classes. I already saw that Form and Wrapper, won't work great if template need a password (because an input with a password is a Hash, and methods require an option hash with default to {}), any case we didn't spec' it enough.

Thought ?

elmatou commented 13 years ago

Before correcting the Form class, we should try to be more constistent in the Call class.

As discussed, let's try to ouput the same type of object we input, (if no specific output is given).

Differents cases are : Input #=> Output File #=> File.new(input.path + '.pdftk').write @stdout Tempfile #=> Tempfile.new('output').write @stdout Stringio #=> StringIO.new(@stdout) String (representing a path) #=> String.new(input + '.pdftk') Hash (representing a set of paths files) #=> ????

Ok, but what does it mean ? if the user didn't ask for any output, we force him to get files, tempfiles.... In second thought, I believe, that, without specified output we should keep the result in mermory (only), and present it to the user (he can save it by himself), as stdout is usedby shells, Basicaly, I say we do not should change our Call logic, but we should be more precise in the return statement.

Differents cases are : Output #=> return File #=> the File Tempfile #=> the Tempfile StringIO #=> the StringIO String (representing a path) #=> the String NilClass #=> StringIO.new(stdout) (could be blank.)

this allow the user to "chain" the commands (ouptut can be used as input)

Aside : or Call#pdftk can use a previous output as input too ? @call.pdftk(:input => ..., :operation =>..).pdftk(:operation => ...).pdftk(:operation =>..., :output => ...) Not sure...

So what is your opinion on all these ?

tcocca commented 13 years ago

Here are my thoughts.

  1. We don't need the InputSimilarToOutput method, if something goes wrong pdftk will error
  2. In terms of standardizing the input output we should do the following:

If the user specifies and output, respect that output. If the user does not specify an output always return StringIO.

There are 2 special cases for this: 'burst' and 'unpack_files'.
These methods do not return any value, they just put files into a directory.
Currently for burst we cd into Dir.tmpdir to run the command.
This is for 2 reasons - if the user doesn't specify and output we don't want to litter the filesystem with files and burst outputs a doc_data.txt file that is always put in the directory that the command is run from, even if you specify and output directory that is different. For 'unpack_files' we should also cd into Dir.tmpdir for the same reason as not littering the filesystem with the files.

Both of these commands should return the Output directory that the files were outputted to, either the directory the user specified or Dir.tmpdir if the user did not specify and output.

These changes give our Call class a unified behavior (aside from burst and unpack_files which are special cases). We respect the passed output and if not you get StringIO. If Dev's don't like StringIO they have the option to specify the output, they can use that all they want. This is also easier to code. Also, we don't want to be writing files to the filesystem if we aren't asked to, so give them StringIO reduces that problem.

I believe the Wrapper class should handle these changes to the Call class without issue.

We will need to update the Form class to be able to take additional options for input type. This class needs to be spec-ed more.

~ Tom

tcocca commented 13 years ago

See the following commit: https://github.com/tcocca/pdftk_forms/commit/f4cfe3339f6456f3a6f7e083156d01c41dc784e1

We were already 95% there on the problem of which type of output to return. The only case where we weren't respecting what the user passed in for output was in the case of a string (filepath), we were returning true. I changed this to give the user back the filepath they passed in. We were already returning StringIO by default when there was no specified output.

Please let me know if there are any more outstanding issues in regards to the input and output. I like the idea of keeping it all in memory unless the user says otherwise, as opposed to setting the output based on what the input is.

~ Tom

tcocca commented 13 years ago

looks like Form#save! already does what we want it to, passes the current @template as the output which will be written to.

Form#save returns a new file, I will change that to return StringIO.new instead.

I believe where we are missing specs is on the different input types, String (file path), File, Tempfile, StringIO as the template option to Form.new