grantmcdermott / tinyplot

Lightweight extension of the base R graphics system
https://grantmcdermott.com/tinyplot
Apache License 2.0
223 stars 8 forks source link

Avoid `type` aliases #199

Open vincentarelbundock opened 1 month ago

vincentarelbundock commented 1 month ago

This is just to express a personal preference for avoiding aliases in the type argument. There are already lots of possible values, and it may not be clear to users why there are differences between, for example, h, hist, and histogram, or between j and jitter. Some similar-looking things are different; others are just aliases. Sure, this can be documented, but there are costs to duplication and lack of consistency.

One potential way to distinguish the "families" of geoms, would be to let type argument accept:

  1. One-letter types as character: Standard types provided by the plot() function only. p, l, etc.
  2. Full word types as strings: New types supplied by tinyplot: hist, rect, etc.
  3. Functions: some version of https://github.com/grantmcdermott/tinyplot/pull/168

This would allow users to clearly distinguish functionality with just a glance at the code.

TBC, this is not a big deal at all; just an aesthetic preference. Feel free to close and keep the aliases.

zeileis commented 1 month ago

Thanks, Vincent, I agree that this deserves some further thought, especially if tinyplot keeps growing as it has so far.

Disclaimer: I tried to write up my ideas concisely but failed. Feel free to ignore the noise below! I decided to post it, even though it isn't as concise and structured as I would have liked it - maybe some if it is useful for one of you nevertheless.

Basic thoughts: Usually, I like partial matching via match.arg() which is also widely used in base R and many practitioners are used to it. In most situations this avoids to define abbreviated shortcuts. However, here, this is not possible because type = "h" or type = "b" are already well-defined and we probably want to support type = "hist" or type = "boxplot" to really mean different things. I think this leads to at least the following three questions:

Functions after refactoring? Based on the discussions so far, I would find it intuitive if type = "foo" would internally call tinyplot_foo() for the drawing (or tp_foo() or something like that). The only exception should be the 1-letter types which could be handled by tinyplot_base or tinyplot_scatterplot or something like that.

New 1-letter abbreviations? In this case I think it would be feasible to add further 1-letter types provided that they are handled with the same tinyplot_base/tinyplot_scatterplot function. Candidates for this would be type = "i" or type = "j" (see https://github.com/grantmcdermott/tinyplot/issues/93 and https://github.com/grantmcdermott/tinyplot/issues/97) which probably don't need a new tinyplot_ function.

Partial matching starting from 2 (or 3) letters? I don't have strong feelings about this. One argument against this (in addition to Vincent's concerns) would be that it will be hard to keep this up if tinyplot is successful and keeps growing. One case in favor of this would be type = "hist" vs. type = "histogram". Both of these will feel very natural to different users so that some will be confused if we settle for just one of the two labels. Of course, we could throw a useful error message a la: You said hist but probably meant histogram, please use the latter. But to me these always feel a bit like: We know perfectly what you mean but still force you to retype your command with the label that we prefer.

vincentarelbundock commented 1 month ago

Functions after refactoring? Based on the discussions so far, I would find it intuitive if type = "foo" would internally call tinyplot_foo() for the drawing (or tp_foo() or something like that). The only exception should be the 1-letter types which could be handled by tinyplot_base or tinyplot_scatterplot or something like that.

Yes, that’s exactly my motivation for the refactor. The one minor issue is that due to the way the code is organized, each type has to be associated with two functions: (1) type_histogram() processes x, y, inputs, computes breaks, etc. (2) draw_histogram() makes the actual calls to drawing functions. #198 refactor the first part. The second could be done in the future.

Then, PR #168 will allow two very cool and powerful things. First, users can supply a custom type_*() function to draw fully customized plots plt(x, y, type = type_custom()).

Second, users can call the type-specific function themselves, and set type-specific arguments in that function. This would be great for documentation, because it would provide a place to get plot-specific info about what’s available, and it would not require us to add a ton of type-specific arguments to the core plt() function. For example:

plt(x, y, type = type_spline(k = 3, bs = "cr"))
plt(x, y, type = type_histogram(breaks = "Sturges"))

New 1-letter abbreviations?

I’ve always found 1-letter abbreviation difficult to approach, and think that code using these is more difficult to read. That said, I don't feel terribly strongly about this, and I appreciate that there’s a tradeoff with concision.

Of course, we could throw a useful error message a la: You said hist but probably meant histogram, please use the latter. But to me these always feel a bit like: We know perfectly what you mean but still force you to retype your command with the label that we prefer.

Lol yeah. What’s also kind of annoying about this is that you know they had to build special logic in the code to raise a specifically worded warning.

A more defensible approach, I think, is to use a simple assertion at the very top of the script to raise a generic informative error. At least with this strategy, there’s a real benefit in terms of simplicity and maintainability for the developer, and we still get informative errors.

type <- "histogram"
checkmate::assert_choice(type, choices = c("p", "h", "hist"))
# Error in eval(expr, envir, enclos): Assertion on 'type' failed: Must be element of set {'p','h','hist'}, but is 'histogram'.

We merged similar assertions functions in tinyplot already, so they’re available.

zeileis commented 1 month ago

Regarding the modularization, I just wanted to point out another possible solution which might be helpful for encapsulating the different types. Rather than having two separate functions one could also have one generator function. Pseudo-code:

type_histogram <- function(data, breaks = "Sturges") {
  ## process arguments
  ## set up actual drawing function
  draw <- function(data) {
    ...
  }
  return(draw)
}

Then one can exploit the lexically scoped function draw() to do the actual drawing. Rather than setting up only a single function in this way, it would also be possible to return a list of such lexically scoped functions, e.g., for handling the data, drawing the main plot, drawing the legend.

The advantage is that you just need a single function and have all the code in one place. It can also make the design cleaner. The disadvantage is that the hurdle for writing new type_ generators is somewhat higher.

vincentarelbundock commented 1 month ago

I like this a lot. Let me think on it.

zeileis commented 1 month ago

Regarding the warning messages and abbreviations: If you would go for a generic warning only, then just using match.arg() might be a convenient solution? Something a la:

if (nchar(type) == 1L) {
  available_types <- c("p", "l", "h", "n", ...)
  type <- match.arg(type, available_types)
  type_fun <- "type_plot"
} else {
  available_types <- c("plot", "histogram", "boxplot", ...)
  type <- match.arg(type, available_types)
  type_fun <- paste0("type_", type)
}