noamross / zero-dependency-problems

Real-world code problems in R, without dependencies
79 stars 25 forks source link

batch-find-replace-merge solution #4

Open dpastoor opened 9 years ago

dpastoor commented 9 years ago

From your simple example isn't a reasonable solution:

dplyr::inner_join(animalsdf, groups )
Joining by: "species"
  length weight species   class
1      5      3     rat  mammal
2      4      4   mouse  mammal
3      3      5  lizard reptile
4      5      6  lizard reptile
5      6      9  turtle reptile

or even just a simple merge for your 'dependecy-free' solution.

merge(animalsdf, groups)

Is there something different about your more complex example that these solutions wouldn't work?

noamross commented 9 years ago

(This refers to https://github.com/noamross/zero-dependency-problems-r/blob/master/batch-find-replace-merge.md)

Yes, in fact, the response to this on the listserv was merge(animalsdf, groups).

BUT the tougher question is: How would someone who does NOT know this figure it out? I note that neither function is covered in the r-novice software carpentry lesson. And the questioner may not know that "merge" or "join" are technical terms for this type of operation.

The questioner did do a good job of producing a reproducible example here, so that path to a solution may be the best one: produce an MRE and ask on a forum/listserv.

Assuming the questioner doesn't know "merge" or "join" as formal concepts, they could search for "r combine data frames". On google, this gets you to merge() pretty quickly, though clicking through the first few results, the questioner might not recognize the solution, because the pages say things like:

Problem: You want to merge two data frames on a given column from each (like a join in SQL).

I also googled "r make new column with info from another data frame", which is one formulation someone might come up with if they are not explicitly thinking of this operation as "combining data". Here, the right solution shows up in the 6th result, which is the first page I found that doesn't assume the reader knows what "merge" means:

Merging dataframes

  • You have a second dataframe with further information about levels in the first dataframe.
  • For instance, our example dataframe can be merged with a dataframe containing further information about the publishers.
noamross commented 9 years ago

I also note that merge() is covered in none of the actual R Manuals except the 3500 page reference index.

noamross commented 9 years ago

The questioner didn't frame things this way, but an experienced MS Excel user might think of this as a VLOOKUP operation. Googling "vlookup in r" gets you to the right answer.

This suggests a search strategy: Do you know how you might do this in another program/language? If so, search for that, but "in R".

dpastoor commented 9 years ago

There is some hope at least, that if they stumble upon dplyr (reasonable) -google gives me data manip with dplyr as the third result on google for the query data manipulation R, and hopefully in looking at dplyr they'd learn about the join functions.

The learn x in y series does mention merging, but it uses data.table.

BTW, your readme isn't updating on github so I didn't notice your other goal of thinking about the novice path to finding, its odd that the readme file is different than what is displayed :-/