matloff / TidyverseSkeptic

An opinionated view of the Tidyverse "dialect" of the R language.
512 stars 46 forks source link

inaccurate discussion of dplyr #9

Open ljanda opened 5 years ago

ljanda commented 5 years ago

You write that dplyr "consists of 263 functions." Though you do state that "a user initially need not use more than a small fraction of them" you then say "the high complexity is clear". This is not an accurate or responsible discussion. Dplyr has six core functions - mutate, select, filter, summarise, arrange, and group_by - that are by far most commonly needed. You then state "every time a user needs some variant of an operation, she must sift through those hundreds of functions for one suited to her current need, which is also inaccurate since the majority of the added functions, eg mutate_if(), mutate_all(), and mutate_at(), are simply clear variants of a core verb, eg mutate() that can be easily referenced within autofill or the help documentation.

I would suggest you at least add a discussion of the six core dplyr verbs or rewrite this section as such: Tidyverse students are being asked to learn a [smaller] volume of material, which is [potentially good] pedagogy. See "The Tidyverse Curse" [a post that covers two concerns with Tidyverse that are not related to what is listed here], in which the author says inter alia that he uses "only" 60 Tidyverse functions -- 60! The "star" of the Tidyverse, dplyr, consists of 263 functions. While a user initially need not use more than a small fraction of them, [since there are six core verbs/functions - mutate, select, filter, summarise, arrange, and group_by] the high complexity is [limited]. Every time a user needs some variant of an operation, she [has no need to] sift through those [functions that can be easily referenced within autofill or the help documentation and are usefully named] for one suited to her current need. [Furthermore, many of the added functions, eg mutate_if(), mutate_all(), and mutate_at(), are simply clear variants of a core verb, eg mutate().]

Also, you do the same number of functions citing with purrr, which once again has a small core of functions (most people use some variant of map()). It is not good practice to just give numbers rather than give the actual details.

Furthermore, in terms of pedagogy, there is a lot of evidence that humans learn things more easily though narrative devices, and it is reasonable to argue that the core dplyr verbs are narrative-driven and memorable, thus making them easier to learn than the base R or data.table syntax (especially to the many R users that are researchers and don't have a CS background or exposure to other programming languages, but arguably easy for most people).

matloff commented 5 years ago

Sorry, I disagree, based on long experience teaching programming and even English.

ljanda commented 5 years ago

You did not address the main issue that you misrepresent dplyr and purrr. I don't want to veer into wild speculation, as your blog post does, but this description seems willfully off-base, especially since you cite a large number to presumably shock/scare readers rather than giving the actual details.

In reality, dplyr relies on six verbs and the teaching materials always start with those six verbs. This is far less complex than base R. People can move on to more complex variants of those verbs, which naturally provides a scaffolded learning experience.

Furthermore, you are giving the "you're wrong because I think you're wrong and I have some supposed credentials" argument. I too have been an educator (high school ELA, undergrad stats, got awards for both) and though I have had experience with teaching I also know that pedagogy research is more reliable than my experience of one.

matloff commented 5 years ago

Not sure what to say here. The "sifting through" a large number of functions actually represents what happened to me personally recently in a discussion about pipes. No matter what the function count is, in the end it's more than in base-R, where one need only know how [,] works. Hence my point about "teach a person to fish."

An essay by definition is one's own opinion, informed by one's own experiences. I hope we can at least agree on that.

ljanda commented 5 years ago

How can you say that all you need to know with base R is how [,] works when you just told me I should be using tapply?

Of course an essay is opinion-based (this is why I have not opened any issues on your, imo, overblown opinions about the impact of the tidyverse on the future of R), but that does not give one carte blanche to misrepresent facts. At the end of the day, dplyr is six core functions that are easy to learn. You're right, if you only teach people base R, they will be fishing more - fishing for the right solution, that is.

matloff commented 5 years ago

The comment on brackets pertained only to dplyr.

On Mon, Jul 15, 2019, 4:12 PM Ludmila Janda notifications@github.com wrote:

How can you say that all you need to know with base R is how [,] works when you just told me I should be using tapply?

Of course an essay is opinion-based (this is why I have not opened any issues on your, imo, overblown opinions about the impact of the tidyverse on the future of R), but that does not give one carte blanche to misrepresent facts. At the end of the day, dplyr is six core functions that are easy to learn. You're right, if you only teach people base R, they will be fishing more - fishing for the right solution, that is.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/matloff/TidyverseSkeptic/issues/9?email_source=notifications&email_token=ABZ34ZKWDN4WLVPF2CS3OU3P7SAU5A5CNFSM4ICMIUE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ52G7A#issuecomment-511419260, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZ34ZP7SIKNZFRWNV2TP6DP7SAU5ANCNFSM4ICMIUEQ .