Open dmi3kno opened 8 years ago
I agree.
I agree that Python and R are different languages with different domains and should be treated accordingly.
My only concern is if we remove for
loops until the very end of the lesson, then the entire lesson will need to be restructured.
The current lesson gets users to load data, create a plot, then automatically create multiple plots from multiple datasets. This requires a for
loop (unless I am mistaken).
The rationale behind the current lesson order is that for
loops and conditionals are fundamental in programming (in other languages). Since this is the programming with r
lesson and not a data analysis in r
lesson, the entire lesson is more focused on programming concepts, rather than data analysis concepts.
I am more than happy to continue this discussion, as it does set a foundation for our students.
I guess you could use a apply action in stead of a for loop, but that is just a for loop in hiding. Since that does not really make a difference but would introduce new functions . So that doesn't help in the software carpentry lessons. I agree with the programming with r vs data analyses in r rationale.
My argument, then, is that there's no such thing as "beginner programmer in R". There's only "beginner analyst in R". It is very rare instance when for-loops need to be written and those shall be reserved to non-rectangular data types. For everything else R has an awesome functional programming toolbox with base::*apply
and purrr::map_*
families which (although rely on C++ for loops) emphasize the functional aspect of it and hide away the implementation details (which do more harm than good to beginners). This is highly philosophical discussion and I am ready to give in on changing the lesson, if you guys confirm that you taught R with for loops and you tried introducing apply instead and you liked the former better.
Hello! Having myself been enchanted by purrr::map()
last year, how about a …-suppl-….Rmd
that compares and contrasts it and base::apply
?
Related to #276, because both readr & purrr are in the tidyverse.
R is evolving so fast that I no longer want to stand by base::apply()
. It should be purrr::map()
all the way. We tried teaching it in SWC Oslo and it works like a charm. Highly recommend watching Hadley's cupcake rant video: https://www.youtube.com/watch?v=GyNqlOjhPCQ
Also, plenty of resources for teaching purrr, not least by Jenny Bryan
I got another comment offline about this and am absolutely convinced we should rewrite 03-loops-R.Rmd (and drop 15-supp-loops-in-depth.Rmd, or merge into into the former).
Contributions welcome! Some inspiration thanks to @jennybc: Thinking inside the box (45min webinar).
I've only been coding for about 6 months now, but here are my 2 cents: I think for loops (or the apply/map functions) should be taught early in R. My first R project after that involved working with over 100 CSV files. From the data carpentry lesson I had taken, I knew how to work with one CSV at a time, but I had to spend a lot of time on the internet to figure out how to use the apply function before I could make much progress on the project.
I guess what I'm trying to say is that for loops are an important functional tool that programmers need to have at their fingertips, and I think it should be taught early on.
The argument made by @dmi3kno makes sense to me as well. @CodeRThane, the idiomatic R
method is to use purrr::map()
or apply
instead of for
loops as you noticed. I'd be happy to see some concrete PRs for this issue.
I am of strong opinion that introducing
for loops
andif else
statements early in the teaching program makes more harm than good to R education. I understand that this is done out of desire to maintain consistency between how Python and R are taught, but I would argue that the approach to teaching (and using) the two languages should be different.I would argue that
for
loops andif else
statements need to be shelved under the "Advanced" topics towards the end of the teaching lesson and instead a sections on*apply
family of functions and logical subsetting should be introduced. R is positioning itself as avectorized
language (even though under the hood it might be running highly optimizedfor
loops in C++), but the R programmer is encouraged to think in terms of vectors and data frames. Introduction of non-vectorized operators breaks that frame and positions R for failure due to apriori lost argument on speed and efficiency.Again, if you agree, I volunteer to handpick material on
*apply
operators from another lesson (I will not introduce any new concepts but rather repackages what is already in the very good Software Carpentry curriculum) and rework logical subsetting (and, perhaps, mention vectorizedifelse()
function) to cater for the need to teach implementation of conditional logic (branching) in R.As I said, the loop and branching sections are good, but only as an advanced topic towards the end of the lesson material.