Closed vincentarelbundock closed 3 weeks ago
Thanks for raising. Ignoring all the roxygen2 (documentation) preamble, here is the basic list of operations that currently happen inside the tinyplot
function before split_data
:
Stepping back and summarizing: there are about 320 lines of code before we get to data splitting. While we could certainly farm out some of sections to standalone functions to minimize code lines (a la https://github.com/grantmcdermott/tinyplot/pull/171), for the most part I think the basic order would need to remain unchanged. It seems to me that 1-4 and 6 would still have to come before split_data
, since they either adjust the underlying (raw) variables or affect the splitting logic. We could move item 5 to afterwards, but that's less than 20 lines so not much of a gain.
Of course, it simply may be that the current flow is just an artifact of the code logic that I settled on (which might not be optimal). I'm definitely open to revising if you want to propose an alternative logic and codeflow. One thing I want to flag, though is that the interaction of grouping (by) and faceting gets tricky. Partly this has do with the fact that grouping variables can be nested within and without facets. But it also relates to some annoying limitations of the base graphics device (see the workaround here, for example, where we manually restore par(mfg)
because it gets automatically reset after the plot window is divided) .
Feel free to push back if you think I'm missing something. Again, I'd love to simplify and further optimize the code logic if at all possible.
Thanks, that's super helpful.
My idea was to move 3&4 after 7. The benefits would be:
stats::loess()
to transform x
and y
in each split_data[[i]]
directly with a simple loop.type_*()
functions, where each of them would have to implement its own by
splits.The downside is things like identifying global breaks for histograms. But that can be a one-linear like:
get_breaks(unlist(lapply(split_data, \(x) x[["x"]])))
But that seems both easier to code, and less common than splitting, which happens for every time.
closing as ill-conceived and under-specified proposal. Would be fixed by https://github.com/grantmcdermott/tinyplot/pull/198
Follow-up to this comment about introducing a
by
argument totype_*()
functions: https://github.com/grantmcdermott/tinyplot/pull/168#discussion_r1685553602Is there a reason to wait so long before splitting the data? It seems that in some cases we wait until nearly the end to create
split_data
, and in many other cases likehist
ordensity
, we need to insert special catches near the top of the function. I feel like that introduces a lot of complexity.Have you considered splitting the data at the very top, and storing it in a nested list?
Then, all plotting functions can operate on individual elements in exactly the same way. And if the list is of length 1, we know there are no facets.