Package usage "in the wild"

batpigandme commented 6 years ago

Related to #25 (see https://github.com/ropensci/unconf18/issues/25#issuecomment-384220571), but I think it's a sufficiently distinct approach to merit its own thread.

Overall idea

As a user, I often find it's helpful to see use-cases for packages and/or functions "in the wild" (i.e. in the context of some workflow or task). Some packages have great vignettes that cover this, but (limited to just a few people) there's simply no way for maintainers/developers to think of all the possible ways a package might come in handy. It can also be extremely helpful to read explanations from people who didn't write the package, since they have a sort of "beginner's mind." (I've done a few "roundups" of tweets, often of blog posts using various packages, e.g. for purrr here for this reason).

I imagine (and have anecdotal twitter evidence 😏) that maintainers also like seeing how their packages are being used, but don't always get that feedback, even when it exists, since the avenues are somewhat limited.

A very roughshod diagram of relationships among packages/feedback that exists "formally"

package_relationships

Here's where it gets fuzzy implementation-wise, but I've been wondering if there would be a good way to highlight package usage in blog posts or case studies (e.g. with blogdown), in such a way that users and maintainers would be able to easily find relevant content.

Carl Goodwin's been doing something to this effect by including tables of packages and functions used in his blogposts (see example from Surprising stories hide in seemingly mundane data below), but this is (to my knowledge) done by hand, and isn't something one would be necessarily be able to find from any docs related to, say, rgeolocate.

Stumbling blocks

Implementation (want to make it useful, without being platform-specific).
Would want it to be opt-in for package maintainers(?)
Possibly just a human communication issue that could be encouraged by, say, talking to other humans.
Breaking changes — blog posts might be an ephemeral format for this very reason.

mpadge commented 6 years ago

Not related to feedback, but certainly to the broader "usage in the wild" issue is the option of trawling .Rhistory files for those who have/use them. This is an option we are pondering for the flipper package-discovery package. Users opt-in to allow trawling, with relevant data extracted and uploaded for our analytic needs. Insight would be very crude, but better than nothing, and it would readily allow auto-extraction and analysis of a local code context in which package functions are used.

noamross commented 6 years ago

Often when I want to see how a function or package is used in other packages I'll do a GitHub search of CRAN packages using the METACRAN org. One could probably design some search patterns to identify R bookdown and blogdown sites on GitHub and crawl them to look at package or function usage.

batpigandme commented 6 years ago

@noamross precisely! That's the piece I that has no loop, currently.

apreshill commented 6 years ago

I ❤️ this idea. I think about this a lot when teaching, because all the formal documentation (by design and by necessity) shows functions typically in isolation, or only in combination with other functions within a package. Some good examples are usage of dplyr functions like between or na_if which in practice will typically go inside a mutate, but in the docs those examples aren't shown. _(A counter-example is for case_when where it is clearly shown how to use within a mutate 👍)_ These are just examples I've run into recently, but it is always a struggle when I teach.

"In the Wild" though, most people use functions across several different packages. A stumbling block I recently ran into learning purrr turned out to be that I failed to realize a function from dplyr was actually what I needed in my workflow (bind_rows!). So, I'm excited about the idea to help users better discover how functions are used in context, and perhaps which co-occur frequently (sort of like on Amazon, other users who used this package/function used it along with these other ones).

I also like the idea of building off of and aggregating all the awesome "real world" code examples on blogs, like the R weekly "R in the Real World" features and "code throughs" as @batpigandme labels on twitter.

batpigandme commented 6 years ago

@apreshill yeah, I feel like there is so much great material out there, but it can be really hard to find — especially when you're in the throes of getting something done, but the bite-sized, real-world examples are often exactly what you need for that click 💡 !

As you (obviously) know, this is certainly true with blogdown (see the thread Alison started on the rstudio community site here), and I think one of the difficulties is that people end up submitting questions as GitHub issues, making it hard on the maintainer(s) (again, thinking of blogdown here).

Anyway, just spitballing at this point, but I have been thinking about this as I've been following the thread you started!

maurolepore commented 6 years ago

Carl Goodwin's been doing something to this effect by including tables of packages and functions used in his blogposts

I like the idea of gathering all functions across multiple packages in a single searchable table. I would like to see/help-build such a thing for rOpenSci and the tidyverse. Using a DT::datatable() this seems like a short unconf project (example, doc).

batpigandme commented 6 years ago

I like the idea of gathering all functions across multiple packages in a single searchable table.

They're by no means mutually-exclusive, but I think these ideas differ in that the one is about navigating the existing documentation.

maurolepore commented 6 years ago

I also believe that it would be nice to have better ways to navigate the existing documentation. And in fact, you can think of the table I suggest as a central interface to do precisely that -- a tool to navigate all the existing documentation of the packages in an onganization. I should clarify that my suggestion is not to generate a table manually but automaticaly.

Here are some implementation details:

For any list of installed packages (e.g. pkgs <- c("fgeo.base", "fgeo.map", "fgeo.tool")) it is easy to table all functions by package (also of datasets by package).
If those packages belong to a single github organization, the links to their corresponding pkgdown websites have a consistent structure (e.g. https://forestgeo.github.io/fgeo.base/, https://forestgeo.github.io/fgeo.map/, https://forestgeo.github.io/fgeo.tool/). It is then easy to create the links programatically for an organization with lots of packages such as rOpenSci.
By using the DT::datatable() widget, the table can be searchable and clickable.

You can maintain this table with almost no extra effort. You can include the table in a vignette of a meta-package (e.g. tidyverse and fgeo). Because the meta-package tracks package versions, updating the vignette of the meta-package updates the table of functions by package.

batpigandme commented 6 years ago

I also believe that it would be nice to have better ways to navigate the existing documentation.

I agree, I'm wondering if it might be worth making this into its own thread, especially since you seem to have so much of the code in place, and it's possible this could be accomplished pretty quickly (IIRC, there were a few indivs/groups who did more than one "project"/tackled more than one issue at last year's unconf).

So, I think it's worthwhile to be clear and disambiguate two potential projects

batpigandme commented 6 years ago

Summary: [Design patterns to] identify, and aggregate applied usage of packages and constituent functions as they're used in practice/conjunction with other packages). Implement some sort of option for package developers to easily point to package function use in "R in the Real World"/"code-through" examples.

ropensci / unconf18