Create a "DataTemplate" class to allow people to output data-driven narratives

tra38 commented 8 years ago

I have not yet written any documentation yet for this feature, but I just want to know whether this feature might be useful enough to merge into master. I have written tests to illustrate how data is to be fed into a new "DataTemplate" class that will decide what Calyx Grammar to invoke.

I will disclose that my pull request does have an ulterior motive. There is currently no open source library that can perform the same functionality as Automated Insights' Wordsmith, so I hope that these commits could turn Calyx into a possible competitor. Certainly you can perform what Wordsmith does by manually typing out if/else statements (as I did in "The Atheists Who Believe In God", my entry for NaNoGenMo 2015), but it seems more convenient if this process can be abstracted away.

tra38 commented 8 years ago

I also implemented in functionality to allow the DataTemplate render raw data input as well, provided that the grammar uses "erb syntax" when referring to the data.

Again, no documentation yet, but I want to see if this pull request will be accepted before I start the documentation process.

tra38 commented 8 years ago

I had some time, so I wrote some documentation. I hope you like it.

maetl commented 8 years ago

Hi, sorry I didn’t get a chance to review this sooner.

I’ll go through the code and post some feedback. I think there are some good, creative ideas here, although it’s slightly different to the vision I had for how template substitution would be implemented in the high level API.

I do think the feature you’re addressing here is useful—I’ve wanted something like this myself, so I understand the use case/need for it.

maetl commented 8 years ago

@tra38 thanks so much for your ideas and code. Really interesting to see some of your ideas for building data-driven narratives (the “Big Picture Data Analysis (Mean, Median, Mode, etc.)” on your fork is a fantastic idea and could be hugely valuable).

It’s good to see a structured API for arranging multiple text generators into a coherent whole. This is definitely a practical issue I’ve found when using Calyx::Grammar. The pattern for composing multiple text generators together was never quite clear, but the basic idea was to nest generators inside the rules of other generators. This works okay, but could definitely be improved upon. Having a separate DSL for composing them might make sense.

A few specific comments about the API changes:

The original design was based around the idea of a uniform pattern for text generators where a random seed would be passed into the constructor, then a #generate method would return the resulting string of text. This same pattern would work for Markov chains, etc, as well as template grammars.
Having two separate styles of template delimiter in the one string format (<%= %> and {}) is a bit jarring to me. I previously considered the extension point for syntax being symmetric to the existing syntax (eg: {{ref}} or {ref!} or something similar). The Erb syntax does open up the scope for composing actual Erb templates though, which could lead to some interesting other consequences for generating HTML documents.
Conditionals. Great idea. I wonder how easy it would be to support this in the existing rule format?
Template variables could be integrated with the existing API by extending #generate to take a context hash mapping keys to values. I’m hoping this could be supported without needing to introduce any additional dependencies or new syntax.
Rules could be memoized which would have a similar effect to context variables, but could be generated within the grammar itself. Right now, rules evaluate each time they’re called—a memoized rule would evaluate once, and then every additional reference to the rule symbol would return the same value. This would simplify some of the boilerplate around handling names, etc, consistently.

maetl commented 8 years ago

Some more notes about memoization.

Can already do this now at the global level by injecting single values into the rule config:

rule :protagonist, ['Vladimir', 'Estragon'].sample
rule :antagonist, Antagonists.find_random

Alternative is to add explicit memoization support to the grammar API:

memo :protagonist, 'Vladimir', 'Estragon'

Alternative is to support a special memoization substitution syntax:

rule :protagonist_name, 'Vladimir', 'Estragon'
rule :protagonist, '{{protagonist_name}}'

I think an addition like this would solve a huge number of the templating issues without requiring a completely separate template API to be used over the top (of course, it’s always possible to use a template API anyway, and it would have particular utility as a way to model specific narrative structures—where the grammar API is too general).

maetl commented 8 years ago

Direct data substitutions could also be handled by a minor change to the existing API (using a very similar approach to what you’ve suggested, but passing the hash into the call to #generate rather than the constructor):

class FakeBio < Calyx::Grammar
  # Q: How to handle delimiters?
end

person = PersonRepository.find_first

bio = FakeBio.new(seed)
bio.generate(name: person.first_name, age: person.age)

tra38 commented 8 years ago

Thanks for the comments! Here is what I think might be a working plan:

Keep data-template as a way to organize code, but find a way to allow users to directly inject data into a Grammar::Calyx class, possibly by using memoization or a change to the API. Ideally, there should be no change to syntax or new template delimiters.
See if there is any way to implement conditionals within Grammar::Calyx.
Clean up the code (move code into new files, remove comments crediting users, change the proposed gem version to 0.6.0 instead of 0.6.2, etc.)

I will also happily remove contribution guidelines as well to have this pull request accepted, although the reason I added them in the first place was because I thought it would a good way to encourage contributors if indeed they do exist.

tra38 commented 8 years ago

Hey Maetl, I saw your roadmap. I'm fine with it so long as you keep on doing regular releases of RubyGems. I plan on starting development on a new branch and submitting brand new pull requests using that branch. I'm a little hazy on what you meant by a "block constructor" though, so I may be working on implementing some of the features we discussed here previously.

I plan on starting work on Saturday, Feb. 13th. Hopefully I will be able to stick to that promise.

I'm closing this pull request since this existing fork won't really be merged into master. I'll open up a new pull request with new code then.

maetl commented 8 years ago

Thanks. I’ll post some contribution guidelines shortly. Feel free to open up new issues on the repo in advance to get feedback before writing any code. Thinking about some of your ideas, I wonder if they might be better suited to a standalone library (robo-data-template?), particularly if you want to integrate ERB and an API focused on narrative templates at a higher level.

With regards to the releases, yes, I am keen to publish more regular improvements on RubyGems. I had written most of the code for 0.6 a few weeks ago—just hadn’t gotten around to testing it and folding it into the main codebase.

By the way, if you need to make a breaking change to a fork, or extend its functionality, you can always reference it from a fork on GitHub directly, so that you’re not dependent on the upstream release process for installing changes you need. This means you can get the exact commit of the library you want at any point, from any repo, without ever needing to upload anything to RubyGems. Bundler supports this natively, and the specific_install gem adds this for command line RubyGems too.

See:

tra38 commented 8 years ago

Thanks for the tip about referencing from a fork. I'm sure to keep that in mind. I'm also thinking about releasing my ideas as a standalone library (though I have to think of a new name for the library then), but in the meanwhile, I'll think of some ideas of how to contribute to your project and will write some issues soon when I get the chance.

maetl / calyx

Create a "DataTemplate" class to allow people to output data-driven narratives #1