noamross / zero-dependency-problems

Real-world code problems in R, without dependencies
79 stars 25 forks source link

apply-functions-to-columns solution #5

Open dpastoor opened 9 years ago

dpastoor commented 9 years ago

(refers to https://github.com/noamross/zero-dependency-problems-r/blob/master/apply-functions-to-columns.md)

seems like a perfect chance to use mapply

This stack overflow discusses how to you use mapply like zip in python

http://stackoverflow.com/questions/9281323/zip-or-enumerate-in-r

noamross commented 9 years ago

OK, a user with Python familiarity might google "python zip in r".

apply() and variants are covered in the SWC R Novice lesson, so perhaps a beginner would think about apply() functions generally, and mapply() is in the reference material. mapply() can also be found via the "See also" part of the lapply() documentation, but not apply().

The 2nd result from searching "apply list of functions to columns of data frame" gets you to a StackOverflow mapply() answer. You need to include "columns" because "r apply list of functions to data frame" doesn't get you there. In general it seems that mapply() is mentioned less in various pages that talk about apply(), sapply(), and lapply().

dpastoor commented 9 years ago

frankly, even after playing with examples, mapply can be tough to grasp, IMO. The 'best' best chance I would think a user would have in figuring this out is understanding

In that case, as long as max speed wasn't an issue or terseness, there are a multitude of 'easy' ways to solve it.

funs <- list(fun1, fun2, fun3)

for (i in seq_along(df)) { df[[i]] <- funs[i] }

but for a beginner with a non-coding background, especially functions as objects is not something I would anticipate someone picking up naturally, and looks like it isn't covered in the SWC novice lessions (not surprisingly)

jennybc commented 9 years ago

I would push back on this question, because it clearly doesn't scale. This is not a general novice problem, applying: unique functions 1 through n for variables 1 through n in a data.frame for n of any size. I'd ask: "what are you really trying to do?" I wouldn't just start trying to solve the problem, taking it as face value. It smells like someone describing the step, not the goal.

dpastoor commented 9 years ago

Completely agree - that was actually my first impression as well.

Its also really not ever going to have generalizable/flexible (in the sense that you'd need to write a custom function for each new column) and would likely be unable to use across df's.

Though, there are some situations where you do need to leverage this concept (from reading the source a while back I believe this is how qplot/ggplot do apply various bits to each layer).

jennybc commented 9 years ago

@dpastoor You're right, it's an interesting puzzle. But not a legit novice question, I suspect. So @noamross you'll have to decide how to handle this situation, since it will come up a lot I suspect in other questions. Often novices pose rather thorny programming problems but, if you peel back the onion a bit, you can design the question away by, e.g., helping them use a more natural data structure. Lots of questions about iterating over this and that, in particular, go away, once you pick the right way to store or shape the data in R.

bbolker commented 9 years ago

my answer to this one, for a novice, would be "go ahead and use a loop. Why not? Speed is unlikely to be a serious problem and you'll have a lot easier time understanding what you did."

noamross commented 9 years ago

@jennybc Thanks for the meta-response! I agree. At least a few example questions should illustrate this point. I have the advantage of knowing the questioner, so I might go back and see if we can peel back the onion now and see whether this would be a good example. (Though the question is a couple of years old; they have since become quite the expert and will probably be a SWC instructor soon.)