matloff / TidyverseSkeptic

An opinionated view of the Tidyverse "dialect" of the R language.
527 stars 46 forks source link

Tidyverse doesn't adhere to well-known concepts of type theory #40

Open justinmcgrath opened 2 years ago

justinmcgrath commented 2 years ago

Much is made of how tibbles are extensions of data frames. According to type theory though, they certainly are not data frames. To be a data frame, tibbles would need to behave like data frames (see Liskov's substitution principle: https://en.wikipedia.org/wiki/Liskov_substitution_principle), but the differ. For example, subsetting is different.

This isn't a minor point. This is foundational to object-oriented programming, and when objects violate Liskov's substitution principle, it makes code difficult to use. The problem isn't that there is anything wrong with the way tibbles work, it's that they pretend to be data frames. Thus functions that expect data frames sometimes do not work with tibbles.

Liskov's substitution principle is very well known, yet tibbles flagrantly violate it. This is one aspect of my biggest problem with the "tidyverse" - the authors are seemingly unaware of the best practices in programming, and they routinely ignore them. Code with "tidy" packages is far more complicated than code without them, but because the authors are good salespeople, and their audience has almost no experience, tidyverse is very popular.

danielreispereira commented 1 year ago

@justinmcgrath, just to make sure I understand your point: why is subsetting different in tibbles? tibble[1:4,] is equivalent to df[1:4, ] isn't it ?

justinmcgrath commented 1 year ago

@justinmcgrath, just to make sure I understand your point: why is subsetting different in tibbles? tibble[1:4,] is equivalent to df[1:4, ] isn't it ?

Things are not always the same. There's a section here called "tibbles vs data frames" that gives some examples of when results differ: https://tibble.tidyverse.org/articles/tibble.html.

matloff commented 1 year ago

Very interesting points, but once again, I think the biggest mistake by factor by the developers of Tidy was to apply programming language theory to noncoders who have trouble grasping the notion of a function.