matloff / TidyverseSkeptic

An opinionated view of the Tidyverse "dialect" of the R language.
512 stars 46 forks source link

About ggplot2 and tidy data #7

Open Enchufa2 opened 5 years ago

Enchufa2 commented 5 years ago

Just one comment about these statements about ggplot2:

I don't consider it part of the Tidyverse, having been developed well before Tidy and thematically unrelated.

RStudio counts ggplot2 as being part of the Tidyverse, but it was developed much earlier, and does not follow the Tidy philosophy.

I don't think it's thematically unrelated, I do think it follows the philosophy. First of all, ggplot2 was designed to receive the input in (Hadley's) tidy form, even before it was called tidy. I believe this fact shaped the idea of tidy data, which culminated in Hadley's Tidy Data paper (JSS 2014), and that was in fact the seed for the Tidyverse.

nicholasjhorton commented 5 years ago

ggplot2 is an elegant system for professional graphics. But it has a number of features that are at odds with the overall tidyverse philosophy (and Hadley has publicly acknowledged these). I'd suggest noting that ggplot2 takes tidy data as input (though lattice and base graphics do as well).

matloff commented 5 years ago

If one takes the definition of "tidy" to mean "row/colum" data frames, then 99% of R is "tidy." The term then becomes meaningless. The ggplot2 package is no more "tidy" than is lm().

Enchufa2 commented 5 years ago

I find myself constantly tidying and untidying data from modelling to visualisation and back to modelling again, because many modelling functions need all the features in columns (the model matrix), but ggplot2 needs many of them folded in long format, in order to be assigned to a layer. That's especially true for factors. The lm interface is pretty tidy in that sense, yes, but many are not.

matloff commented 5 years ago

Thanks! Norm

On Wed, Jul 10, 2019, 10:57 AM Iñaki Ucar notifications@github.com wrote:

I find myself constantly tidying and untidying data from modelling to visualisation and back to modelling again, because many modelling functions need all the features in columns (the model matrix), but ggplot2 needs many of them folded in long format, in order to be assigned to a layer. That's especially true for factors. The lm interface is pretty tidy in that sense, yes, but many are not.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/matloff/TidyverseSkeptic/issues/7?email_source=notifications&email_token=ABZ34ZKZQ5K6DYGAJNRW4TDP6WP6PA5CNFSM4H7DW5A2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZSZTVA#issuecomment-509974996, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZ34ZLUA7YBXD4PL4VMN43P6WP6PANCNFSM4H7DW5AQ .

drag05 commented 2 years ago

@Enchufa2

Is there need for tidying and untidying? This example below could result in modeling and plotting at the same time. Data format remains unchanged:

dt = as.data.table(iris)
lapply(
        list('loess', 'glm', 'lm'), 
                 function(i) {
                               dt[, ggplot(.SD, aes(Petal.Length, Sepal.Length)) + 
                               geom_point() + 
                               geom_smooth(aes(color = Species), method = i)]
                             }
                )