jonocarroll / ggeasy

ggplot2 shortcuts (transformations made easy)
https://jonocarroll.github.io/ggeasy/
272 stars 21 forks source link

A grammar of graphics details? #7

Open timchurches opened 6 years ago

timchurches commented 6 years ago

I was just wondering if it is worth designing and defining an formal or semi-formal grammar for these function names, so that they are readily guessable without having to look them up? In other words, a naming convention, such as, based on what has been done so far, [prefix][axis-qualifier][attribute|verb].

Or, rather than having to define a zillion easy_ functions, what about one function that uses a little DSL to set the theme attributes? No need to mess with lex and yacc (which are available in R via the rly package btw), it might be enough just to pass commands and scalars as ellipsis arguments eg

ggdetails("x", "axis", "blue")

or, equivalently,

ggdetails("blue", "x", "axis")

or, using a lexer (eg rly) to tokenise a single string argument:

ggdetails("blue x axis")

or

ggdetails("x axis blue")

This also makes it easier for users who are non-native English speakers, whose natural word-ordering assumptions may be different - order would not matter.

The order of the ellipsis arguments or tokens wouldn't matter because the class of the argument or token can be inferred from its value in the very constrained context of the ggdetails() function. "axis", "legend", "text" can only refer to plot elements, "blue", "orange" or a hex RGB value can only refer to colours, and "2" or "5" are scalars (for font size or rotation), and "+25%" or "-33%" means increase or decrease current size (or whatever is specified by 25% or 33% respectively. That way argument order doesn't need to be remembered.

Actually, using yacc and lex via rly to build a simple DSL might be the best option, but the utility of the concept could be tested using individual ellipsis arguments to start with.

jonocarroll commented 6 years ago

This is a seriously cool idea. I'd lean towards having the zillion helper functions and the 'ggbot' chat lexer. Looking at https://github.com/systemincloud/rly I don't think I could immediately build that, so if you're more familiar with the idea then by all means please have a crack at it. I think it would be of great benefit!

We could make a fairly simple approximation to this with a heap of if() statements... I started writing this then got carried away trying to see how it might work... now it's here: https://github.com/jonocarroll/ggeasy/pull/8

timchurches commented 6 years ago

The text expression is a command, in which the subject is implicit (the subject is "ggbot"), the verb is implicit ("make"), the object of the command is one of the entities listed under Arguments here, the attribute of the named entity to be modified is inferred from the attribute value ("blue" must be a colour, "2" must be an absolute size/thickness scalar, "2cm" is scale with units, "+25%" is a relative scalar, "-45deg" is an angular quantity etc). In some cases the attribute to be modified might need to be named explicitly. One command per string, but a vector of strings or multiple ellipsis string arguments could be passed to one call of ggbot(). Probably want to allow modification of multiple attributes per command string, but strictly only one object entity per command. Thus, "text blue" would make all text blue. "text blue 15" would make all text blue and size 15. But "text line blue" would be illegal because there are two object entities (targets): "text" and "line". (Aside: this restriction of one target entity per command is just to keep it simple to start with).

Now, a minor complication is how the object entities should be specified. "text" or "line" are easy, and, as per the ggplot2 theme model, these apply to all text or to all lines. But what about, say, the x-axis text? Well, adding "x" or "y" (or "z"?) implies that modification of some axis attributes are being requested, in other words, that "axis" is implied. If modification of both axes is desired, then any or all of the following could be supported: "axis blue", "axes blue" "x y blue". Except we haven't specified which aspect of the axis or axes we want to modify, so we need a qualifier: for axes, the valid ones are "title", "text", "ticks" and "line" (and pluralised fires of those etc).

What about suppressing elements? I think the solution is recognise some special-case attributes, such as "invisible", or gerund forms such as "disappear", "begone" or just "no" or "none" or "zap" or "ditch" etc.

OK, is this language model adequate? The way to find out is to build a table of all the entity target types which theme() supports, and a separate table for each entity type, enumerating all the attributes that entity type can have set, and specify an example command string and check that it can be unambiguously parsed. Then check that there is no overlap between any of the words used as specifications in both tables. If there is any overlap (i.e. the sets of words are not disjoint) then there will be ambiguity which can only be resolved by word order, which means using a more complex language model.

Creating the tables is a slightly tedious task, but if split up shouldn't take too long. Once the adequacy of the language model is confirmed, or it is tweaked until adequate, then coding should commence. Such a table can also provide the basis for the ggbot() tests, of course.

Obviously, lots of synonyms can be included in the language model. The question is that whether an unambiguous model can be constructed with just single-word tokens, or not? On a quick scan, I think it can, but that needs to be thoroughly checked. If not, then a smarter tokeniser may be needed. A lemmatiser could also handle synonyms and alternative spellings etc. But I agree, the aim should be to keep it as lightweight as possible. The aim of designing the language model first, before coding it up, is to check whether a bunch of if/else statements is enough, or whether a proper lever and lemmatiser is needed or worthwhile. Or whether is is better to build a formal domain-specific language, in which case yacc (via rly) needs to be used to build a parse tree. However, I don't think we want a formal DSL.

timchurches commented 6 years ago
entity element type specifier(s) synonyms
line element_line line lines
rect element_rect rect rectangle, rectangles
text element_text text
title element_text title titles, headings
aspect.ratio ? aspect, ratio
axis.title element_text axis, title axes, titles
axis.title.x element_text x, title, axis (implied by x)

...and the rest of these, and then need to consider the settable attributes for element type, in a separate table. The main thing is to ensure that there is no overlap (i.e. ambiguity) between the way entities are specified and the way attributes and quantities/values are specified.

axis.title.x.top    
x axis label on top axis (element_text; inherits from axis.title.x)

axis.title.y    
y axis label (element_text; inherits from axis.title)

axis.title.y.right  
y axis label on right axis (element_text; inherits from axis.title.y)

axis.text   
tick labels along axes (element_text; inherits from text)

axis.text.x 
x axis tick labels (element_text; inherits from axis.text)

axis.text.x.top 
x axis tick labels on top axis (element_text; inherits from axis.text.x)

axis.text.y 
y axis tick labels (element_text; inherits from axis.text)

axis.text.y.right   
y axis tick labels on right axis (element_text; inherits from axis.text.y)

axis.ticks  
tick marks along axes (element_line; inherits from line)

axis.ticks.x    
x axis tick marks (element_line; inherits from axis.ticks)

axis.ticks.y    
y axis tick marks (element_line; inherits from axis.ticks)

axis.ticks.length   
length of tick marks (unit)

axis.line   
lines along axes (element_line; inherits from line)

axis.line.x 
line along x axis (element_line; inherits from axis.line)

axis.line.y 
line along y axis (element_line; inherits from axis.line)

legend.background   
background of legend (element_rect; inherits from rect)

legend.margin   
the margin around each legend (margin)

legend.spacing  
the spacing between legends (unit)

legend.spacing.x    
the horizontal spacing between legends (unit); inherits from legend.spacing

legend.spacing.y    
the horizontal spacing between legends (unit); inherits from legend.spacing

legend.key  
background underneath legend keys (element_rect; inherits from rect)

legend.key.size 
size of legend keys (unit)

legend.key.height   
key background height (unit; inherits from legend.key.size)

legend.key.width    
key background width (unit; inherits from legend.key.size)

legend.text 
legend item labels (element_text; inherits from text)

legend.text.align   
alignment of legend labels (number from 0 (left) to 1 (right))

legend.title    
title of legend (element_text; inherits from title)

legend.title.align  
alignment of legend title (number from 0 (left) to 1 (right))

legend.position 
the position of legends ("none", "left", "right", "bottom", "top", or two-element numeric vector)

legend.direction    
layout of items in legends ("horizontal" or "vertical")

legend.justification    
anchor point for positioning legend inside plot ("center" or two-element numeric vector) or the justification according to the plot area when positioned outside the plot

legend.box  
arrangement of multiple legends ("horizontal" or "vertical")

legend.box.just 
justification of each legend within the overall bounding box, when there are multiple legends ("top", "bottom", "left", or "right")

legend.box.margin   
margins around the full legend area, as specified using margin

legend.box.background   
background of legend area (element_rect; inherits from rect)

legend.box.spacing  
The spacing between the plotting area and the legend box (unit)

panel.background    
background of plotting area, drawn underneath plot (element_rect; inherits from rect)

panel.border    
border around plotting area, drawn on top of plot so that it covers tick marks and grid lines. This should be used with fill=NA (element_rect; inherits from rect)

panel.spacing   
spacing between facet panels (unit)

panel.spacing.x 
horizontal spacing between facet panels (unit; inherits from panel.spacing)

panel.spacing.y 
vertical spacing between facet panels (unit; inherits from panel.spacing)

panel.grid  
grid lines (element_line; inherits from line)

panel.grid.major    
major grid lines (element_line; inherits from panel.grid)

panel.grid.minor    
minor grid lines (element_line; inherits from panel.grid)

panel.grid.major.x  
vertical major grid lines (element_line; inherits from panel.grid.major)

panel.grid.major.y  
horizontal major grid lines (element_line; inherits from panel.grid.major)

panel.grid.minor.x  
vertical minor grid lines (element_line; inherits from panel.grid.minor)

panel.grid.minor.y  
horizontal minor grid lines (element_line; inherits from panel.grid.minor)

panel.ontop 
option to place the panel (background, gridlines) over the data layers. Usually used with a transparent or blank panel.background. (logical)

plot.background 
background of the entire plot (element_rect; inherits from rect)

plot.title  
plot title (text appearance) (element_text; inherits from title) left-aligned by default

plot.subtitle   
plot subtitle (text appearance) (element_text; inherits from title) left-aligned by default

plot.caption    
caption below the plot (text appearance) (element_text; inherits from title) right-aligned by default

plot.margin 
margin around entire plot (unit with the sizes of the top, right, bottom, and left margins)

strip.background    
background of facet labels (element_rect; inherits from rect)

strip.placement 
placement of strip with respect to axes, either "inside" or "outside". Only important when axes and strips are on the same side of the plot.

strip.text  
facet labels (element_text; inherits from text)

strip.text.x    
facet labels along horizontal direction (element_text; inherits from strip.text)

strip.text.y    
facet labels along vertical direction (element_text; inherits from strip.text)

strip.switch.pad.grid   
space between strips and axes when strips are switched (unit)

strip.switch.pad.wrap   
space between strips and axes when strips are switched (unit)