senny / sablon

Ruby Document Template Processor based on docx templates and Mail Merge fields.
MIT License
443 stars 126 forks source link

Extract list of fields from DOCX template #133

Open anamba opened 5 years ago

anamba commented 5 years ago

Currently, I am trying to see whether it is possible to iterate through the template DOCX file and build a list of the fields.

I am still in the process of familarizing myself with this codebase, but so far I don't see a straightforward way to do this. Based on my limited knowledge, it looks like it would involve creating a new field handler that would collect the list of fields somewhere, and when processing is finished, we return that list.

Using the gem in a web app context, being able to enumerate the mail merge fields in the template would allow me to receive an uploaded DOCX, read the field list, and prompt the user for values for each field.

For the executable version, it would be great if the executable had an option that would take only an DOCX template is input and output a "template" JSON file with the fields as keys and empty strings or null for the values. Then, after filling in the values in the JSON file, you can run the executable again the usual way and it would produce the finished DOCX output.

stadelmanma commented 5 years ago

@anamba this seems pretty similar to #114 and the rough untested tcode snippet I have in that issue would be a decent start (copied and tweaked below). It wouldn't handle block constructs in a completely correct fashion since you'd catch all of the control fields. Overriding the field handlers in the gem would also work and probably give you more control since you wouldn't have to worry about extra control fields showing up.

# Load in the content however you want, the entry name you'll need to
# fetch from the zip archive is "word/document.xml" unless you have fields in the header/footer as well
# Some of the code in
# https://github.com/senny/sablon/blob/master/lib/sablon/document_object_model/model.rb
# might be useful in that regard
xml_node = Nokogiri::XML(content)

# returns array of SimpleField and ComplexField instances
# See: https://github.com/senny/sablon/blob/master/lib/sablon/parser/mail_merge.rb
# for some additional implementation details
parser = Sablon::Parser::MailMerge.new
fields = parser.parse_fields(xml_node)).map do |fld|
    field.expression # process the expression to extract only the context key
end
fields = fields.uniq  # drop the duplicates since we don't need every instance

I like this idea since it makes it easier to work with someone else's documents as well as validate your own documents against a context (i.e. as part of an integration test). If you would want to submit a PR adding this functionality I'd be happy to review and help with it.