tcocca / active_pdftk

ruby wrapper for the pdftk command line utility for working with editable pdf files
MIT License
46 stars 37 forks source link

Handle pdftk options #11

Closed elmatou closed 13 years ago

elmatou commented 13 years ago

We should handle all (or almost) options avaliable with pdftk A mapping has to be defined, it could be :

    # [ input_pw < input PDF owner passwords | PROMPT > ]         :input_pw  => string
    # [ < operation > < operation arguments > ]                 none
    # [ output < output filename | - | PROMPT > ]                 :output  => string
    # [ encrypt_40bit | encrypt_128bit ]                          :encrypt  => 40bit | 128bit
    # [ allow < permissions > ]                                   :allow  => array of [Printing, DegradedPrinting, ModifyContents, Assembly, CopyContents, ScreenReaders, ModifyAnnotations, FillIn, AllFeatures]
    # [ owner_pw < owner password | PROMPT > ]                    :owner_pw  => string
    # [ user_pw < user password | PROMPT > ]                      :user_pw  => string
    # [ flatten ]                                                 :flatten  => true | false
    # [ compress | uncompress ]                                   :compress  => true | false
    # [ keep_first_id | keep_final_id ]                           :keep_id  => :first | :final
    # [ drop_xfa ]                                                :drop_xfa  => true | false
    # [ verbose ]                                                 verbose  => true | false
    # [ dont_ask | do_ask ]                                       ask  => true | false

then a method to generate the correct string

# Only draft method
    def build_options(options = {})
      @options.merge(options).collect {|option, value| option.to_s if value}
    end
tcocca commented 13 years ago

Not sure we would be able to handle the do_ask prompts? How would we take the prompts and give feedback back to the cli? I'll keep thinking about all these.

elmatou commented 13 years ago

I agree with you, we don't need to bother with the CLI....

elmatou commented 13 years ago

Hi Tom, I'm working on this topic ; at first, I wanted to make flatten an option, but it is better to handle all options the same way. currently I have to set a mapping for options and operations, and some methods to build the command line.

First thing to do is to choose the I/O. For the input (output should fetch pdftk syntax), I think a pdftk call should be build upon a hash. These hash have to be build by the user or by the abstraction layer, and could look like these one :

command = {
:input => {'a.pdf' => 'foo', 'b.pdf' => 'bar', 'c.pdf' => nil},          # One or more files with there password (or nil) | accept input file name as a simple string if we use only one file without password.
:operation => {:fill_form => 'a.fdf'},          # A hash representing the operation, and specific options | accept the operation as a simple symbol if we do not need specific options.
:output => {'out.pdf' => {:flatten => true, :owner_pw => 'bar', :user_pw => 'baz', :encrypt  => :'40bit'}}        # A hash representing output, filename & the output options | accept the file name as a simple string if we do not need output options.
}

My first thought was to accept a very flat hash, by it was difficult to preserve order of statements.

Then the command hash have to be check against a mapping of pdftk abilities. For now it is less than a draft, and the structure of this mapping will depend on the building methods

    PDFTK_MAPPING = {
      :input => {
        :input_pw => 'input_pw',
      },
      :operation => {
        nil => nil,
#        :cat => Array of Range,                 # Array of ranges [ < page ranges > ]
#        :shuffle => Array of Range,             # Array of ranges [<page ranges>]
#        :burst => nil,
#        :generate_fdf => nil,
        :fill_form => 'fill_form',                   #< FDF data filename | XFDF data filename | - | PROMPT >
#        :background => String,                   #< background PDF filename | - | PROMPT >
#        :multibackground => String,              #< multibackground PDF filename | - | PROMPT >
#        :stamp => String,                        #< stamp PDF filename | - | PROMPT >
#        :multistamp => String,                   #< multistamp PDF filename | - | PROMPT >
#        :dump_data => nil,
#        :dump_data_utf8 => nil,
        :dump_data_fields => nil,
        :dump_data_fields_utf8 => nil,
#        :update_info => String,                  #< info data filename | - | PROMPT >
#        :update_info_utf8 => String,             #< info data filename | - | PROMPT>
#        :attach_files => Array of String,        #< attachment filenames | PROMPT > [ to_page < page number | PROMPT > ]
#        :unpack_files => nil
      },
      :output => {
        :owner_pw => 'owner_pw',
        :user_pw => 'user_pw',
        :encrypt  => {:'40bit' => 'encrypt_40bit', :'128bit' => 'encrypt_128bit'},
        :flatten  => {true => 'flatten', false => nil},
        :compress  => {true => 'compress', false => 'uncompress'},
        :keep_id  => {:first => 'keep_first_id', :final => 'keep_final_id'},
        :drop_xfa  => {true => 'drop_xfa', false => nil},
        :allow  => ['Printing', 'DegradedPrinting', 'ModifyContents', 'Assembly', 'CopyContents', 'ScreenReaders', 'ModifyAnnotations', 'FillIn', 'AllFeatures']
      }
    }

Once this is said, we need to build the various parts of the command line (input, operation, output.).

    # {:input => {'a.pdf' => 'foo', 'b.pdf' => 'bar', 'c.pdf' => nil},
    # :operation => {:fill_form => 'a.fdf'},
    # :output => {'out.pdf' => {:flatten => true, :owner_pw => 'bar', :user_pw => 'baz', :encrypt  => :'40bit'}}}
    # #=> ["B=c.pdf C=a.pdf D=b.pdf input_pw C=foo D=bar", ["fill_form a.fdf"], ["output", "out.pdf", ["flatten", "encrypt_40bit", "owner_pw bar", "user_pw baz"]]]

    def set_cmd(options = {})
      [
      build_input(options[:input]),
      build_options(PDFTK_MAPPING[:operation], options[:operation]),
      ['output', options[:output].keys.first] << build_options(PDFTK_MAPPING[:output], options[:output].values.first)
      ]
    end

    # {'a.pdf' => 'foo', 'b.pdf' => 'bar', 'c.pdf' => nil} #=> "B=c.pdf C=a.pdf D=b.pdf input_pw C=foo D=bar"
    def build_input(options)
      out, i = [[], "input_pw",[]], "A"
      case options
      when Hash:
        options.each do |file, pass|
          out.first << "#{i.next!}=#{file}"
          out.last << "#{i}=#{pass}" if pass
        end
      when String:
        out.first << options
      end
      (out.last.empty? ? out.first : out.flatten).join(' ')
    end

    # {:flatten => true, :owner_pw => 'bar', :user_pw => 'baz', :encrypt  => :'40bit'}} #=> ["flatten", "encrypt_40bit", "owner_pw bar", "user_pw baz"]
    # OR
    # {:fill_form => 'a.fdf'} #=> ["fill_form a.fdf"]
    def build_options(*args)
      abilities = args.shift || {}
      options = args.shift || {}
      @options.merge(options).collect do |option, value|
        current = abilities[option.to_sym]
        case current
          when String:  "#{current} #{value}"
          when Hash:    "#{current[value]}"
          when Array:   "#{option} #{current.collect{|i| i.to_s.downcase} && value.collect{|i| i.to_s.downcase}.join(' ')}"
        end
      end
    end

For now it works pretty well, and for two main reasons:

I'll work forward on this branch, but if you have any thoughs on my firsts steps fell free to say anything. Cheers,

tcocca commented 13 years ago

I'll take a thorough look at this tonight and give you some opinions. Maybe we should look at the awesome pdfkit (for wkhtmltopdf) lib for some inspiration: https://github.com/jdpace/PDFKit (its the only CLI task wrapper i can think of right now).

Off the top of my head I think we should consider making a Command class that we pass options too. Maybe that Command class should also be responsible for detecting the version and the path stuff, so we separate out the Command from the wrapper? The wrapper would just call things on command but command would be responsible for setting the arguments and stuff..

Maybe thats too much abstraction, not sure. I'll think about it and jot some thoughts down tonight.

~ Tom

elmatou commented 13 years ago

Hi Tom, I worked hard on this topic today, I'm note done yet, and will send my results later. Once you suggested I created a PdftkForms:Call class that create, execute, retrieve everything about the pdftk call. For now I can tell, we will support :

See you,

tcocca commented 13 years ago

NICE! can't wait to check it out. I definitely think a class for creating the actual "command" string is a good idea, and wrapper basically should call that in the call_pdftk method. looking forward to seeing what you come up with.

elmatou commented 13 years ago

It takes more time than expected (not surprising). I still need to test the I/O streams. But the command builder is ok. I choose a structured full command hash, but we can discuss about it, if we need more flexibility, for now it is :

:path => 'pdftk',
:input => {'a.pdf' => 'foo', 'b.pdf' => 'bar', 'c.pdf' => nil},
:operation => {:fill_form => 'a.fdf'},
:output => Tempfile.new('yeah'),
:options => { :flatten => false, :owner_pw => 'bar', :user_pw => 'baz', :encrypt  => :'40bit'}

it build something like :

"B=c.pdf C=a.pdf D=b.pdf input_pw C=foo D=bar fill_form a.fdf output - encrypt_40bit owner_pw bar user_pw baz"
elmatou commented 13 years ago

Hi Tom, I'm quite done with the implementation and testing of PdftkForms::Call.

The DSL (the command hash) is a very important feature for the project, it have to be cool (Rails vocable for user-friendly), and efficient (flexible and structured). In order to be sure of choices I made (or we'll make), I'd like you to write some more tests, for remaining operations and or options, in order to practice the DSL, and tell what improvement can be made (or have to be done).

I merged my options-handling branch with your master, It should be easy to pull it in the futur. For now you can find it here and test it. https://github.com/elmatou/pdftk_forms/tree/options-handling

BTW, if you open the wiki section (in github admin) I'll start to write a tutorial on how to use it. Also, I'm not very used with the rdoc description, but we should start adding them to our code.

tcocca commented 13 years ago

Done and merged into master

g-i-o-r-g-i-o commented 6 years ago

how do I allow signing, but disallow modifications before effectively signing the documents with an electronic signature? thanks