tidyverse / tidyr

Tidy Messy Data
https://tidyr.tidyverse.org/
Other
1.38k stars 418 forks source link

Animated gif in documentation #515

Closed slowkow closed 5 years ago

slowkow commented 5 years ago

The gif

@apreshill made the best explanation for tidyr::gather() that I have ever seen:

gather

Thank you, Alison! :100:

The fundamental problem this gif solved for me is the confusion caused by the key and value arguments.

Before seeing this gif, I falsely (and instinctively) equated these things:

This gif finally opened my eyes to the fact that key and value in tidyr::gather() are analogous to variable.name and value.name in reshape2::melt(). So many struggles :sob:

Could we include this gif directly in the documentation on this page?

untitled drawing

Right now the gif is posted on Twitter, but I'd like to see it in more places where people can find it while they're searching for information about tidyr::gather().

A complaint about magical expressions

Also, I want to complain that the example in the documentation uses the naked words stock and price instead of "stock" and "price" as I would have expected. This is one of the magical things about R that was always difficult to accept as a beginner (and continues to be a challenge as a more experienced R user). It is difficult to accept that R will just know that stock should be turned into "stock" as a column name, instead of throwing an error because there is no stock variable defined anywhere, and there is no stock column in the input dataframe, either.

My expectation is this:

data %>% gather(stock, ...)
Error: object 'stock' not found

A direct comparison to melt

Lastly, you might consider whether the documentation for gather() would be enhanced by showing a direct side-by-side comparison of the same operation executed 2 times, once by gather() and again by melt(). Personally, I would find it helpful. Others might find this confusing if they have never seen either function before, so it might not be a good idea.

Something like this:

library(dplyr)
library(tidyr)

mini_iris <- iris %>%
  group_by(Species) %>%
  slice(1)

# Let's gather all the measurements (except Species) into 1 column:
mini_iris %>% gather(key = flower_att, value = measurement, -Species)

# Now let's do the same thing again with reshape2::melt()
library(reshape2)

melt(
  data = mini_iris,
  variable.name = "flower_att",
  value.name = "measurement",
  id.vars = c("Species")
)
batpigandme commented 5 years ago

So, let me go through some pieces of this, and hopefully we can figure out if this should be split into multiple issues, or if it can be addressed all at once:

The gif

Right now the gif is posted on Twitter, but I'd like to see it in more places where people can find it while they're searching for information about tidyr::gather().

I think my love for gifs is pretty well established. However, since we use roxygen2, putting this directly into the gather() function reference (and, hence, into the cascade of .R.Rd.html through pkgdown) would mean putting a gif into the ?gather help in R, which I don't know that the world (or CRAN) is ready for.

With reprex (which begs for explanation by gif, IMHO), we've gotten around this by adding non-vignette articles to the pkgdown site (e.g. Magic reprex, Using datapasta with reprex) to make the giffed-up documentation accessible, without rocking the boat of TPTB.

A complaint

Also, I want to complain that the example in the documentation uses the naked words stock and price instead of "stock" and "price" as I would have expected.

At least in the development version documentation, this example reads as such:

suppressPackageStartupMessages(library(tidyverse))
stocks <- tibble(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

gather(stocks, stock, price, -time)
#> # A tibble: 30 x 3
#>    time       stock  price
#>    <date>     <chr>  <dbl>
#>  1 2009-01-01 X     -1.11 
#>  2 2009-01-02 X      0.758
#>  3 2009-01-03 X     -0.192
#>  4 2009-01-04 X     -2.64 
#>  5 2009-01-05 X     -0.854
#>  6 2009-01-06 X      1.70 
#>  7 2009-01-07 X      1.15 
#>  8 2009-01-08 X     -1.38 
#>  9 2009-01-09 X     -0.349
#> 10 2009-01-10 X     -1.66 
#> # ... with 20 more rows
stocks %>% gather(stock, price, -time)
#> # A tibble: 30 x 3
#>    time       stock  price
#>    <date>     <chr>  <dbl>
#>  1 2009-01-01 X     -1.11 
#>  2 2009-01-02 X      0.758
#>  3 2009-01-03 X     -0.192
#>  4 2009-01-04 X     -2.64 
#>  5 2009-01-05 X     -0.854
#>  6 2009-01-06 X      1.70 
#>  7 2009-01-07 X      1.15 
#>  8 2009-01-08 X     -1.38 
#>  9 2009-01-09 X     -0.349
#> 10 2009-01-10 X     -1.66 
#> # ... with 20 more rows

Created on 2018-11-06 by the reprex package (v0.2.1.9000)

It uses non-standard evaluation (NSE) which is used throughout the tidyverse, and in base R as well. I hear what you're saying in re. it being unexpected if you're anticipating standard evaluation, but it does indeed work. (We also use these semantics for rename(), etc.) This isn't to say it's an ideal example, but perhaps that it deserves more explanation, or illustration that it works whether or not you use quotation marks.

Comparison to melt

Lastly, you might consider whether the documentation for gather() would be enhanced by showing a direct side-by-side comparison [to]…melt(). Personally, I would find it helpful. Others might find this confusing…

There's a nice example of this idea that @jimhester did for plyr and dplyr here. As you point out, this is something that's super helpful to a specific target audience. As such, I'm not sure that the function reference is the best place to document it. Perhaps an article would work for this as well, whether part of the official tidyr documentation or not.


In going through this, I've realised that I need to better-document the one-issue-per-issue paradigm we try to follow in the tidyverse, and that we need to add the repo CONTRIBUTING guide to tidyr! 🙈

I'll wait for someone else to chime in, but if it looks like these are fixes that will go in separate places, then we should probably split it, so that we can easily tie the closing PRs to their matching issues.

slowkow commented 5 years ago

Mara, thanks for your thoughtful response! I agree with everything you said, and I apologize for putting multiple issues into one post.

With reprex (which begs for explanation by gif, IMHO), we've gotten around this by adding non-vignette articles to the pkgdown site

I love the gifs in the reprex articles!

I hear what you're saying in re. it being unexpected if you're anticipating standard evaluation

Yes. Since I learned to use melt() without NSE, I expected gather() to be the same as melt(). This is my own personal problem, but I think NSE in general might be one of the biggest pain points for newcomers to R.

Proposal

I'd like to propose the following:

Does that sound like a reasonable path forward to resolve this issue and improve tidyr docs?

hadley commented 5 years ago

@slowkow in future, would you mind creating multiple issues for multiple questions?

I don't see any problem with including the animated gif in the docs, assuming @apreshill is ok with it. Roxygen2 and pkgdown support gifs without problems.

We should change the tidyr documentation to consistently use "" when creating new variables. For backwards compatibility, we can't ever eliminate the old behaviour (except in the eventual successors for gather() and spread()), but we can steer people away from it.

apreshill commented 5 years ago

Happy to contribute- these were my "small n" teaching materials so I'm very glad they can be put to good use!

The files are here currently: https://github.com/apreshill/teachthat/tree/master/gather

Unfortunately they are based off .png files exported manually from powerpoint. I'll put the ppt files up shortly (edit: apreshill/teachthat@d3e2e4f641a91daab916d6f6781d4abc832b320d)

If you want me to do anything further via a PR, we could talk about if this data is what you want to use, and whether you want this added to the docs or as an article like the reprex example. I have the same kind of gif made for illustrating spread too.

khailper commented 5 years ago

I'll add the gif.

khailper commented 5 years ago

@hadley, do you know of any examples of gifs in documentations? I'm trying a couple of ways (Rmd notation and some latex based on a Colin Fay post) with no success.

hadley commented 5 years ago

Here's an example: https://github.com/r-lib/vctrs/blob/40274ce9abc90437542408114e77112e9b0fa5ec/R/cast.R#L18

The figure lives in man/figures and the path is relative to that directory

khailper commented 5 years ago

All I'm getting is Rendering development documentation for 'gather' at the command line and blank lines where the gif should be. Although I'm getting the same problem with \figure{logo.png}, which also lives in the right folder, so maybe it's something wrong with my machine/instance.

hadley commented 5 years ago

It's possible that the usual devtools documentation simulation isn't sufficient, and you'll need to do the full install-and-restart.

khailper commented 5 years ago

Install and restart worked for logo.png, but not gather.gif.

cderv commented 5 years ago

I think this is because gif is not an accepted format for R documentation man page. See writing R extension:

The files containing the figures should be stored in the directory man/figures. Files with extensions .jpg, .jpeg, .pdf, .png and .svg from that directory will be copied to the help/figures directory at install time.

This means that at installation the file gather.gif won't be moved to the help section in installation folder and then won't be find by the help page locally.

The only way I know currently is using html code inside a if{html}{} clause (you could try with this link as an exemple). There is an example using a gif in Rd file this way and other examples using html code in help page. Inconvenient : this will only work in online environment.

This probably need some hacking maybe to find the workaround to locally, or ask that the installation process don't filter out gif file.

However, having a gif that is in man/figures to be included in pkgdown rendering but not in the help page may be possible more easily... 🤔

yutannihilation commented 5 years ago

Interesting.

This means that at installation the file gather.gif won't be move to the help section and won't be find by the help page locally.

This seems to happen around here: https://github.com/wch/r-source/blob/41c8d48cf0458921fb5df28d8427d961a61641c1/src/library/tools/R/install.R#L1198-L1202

If you really want to bundle the image, you can place it under inst/ (say, inst/gather.gif) and embed it by using base64 (But I agree using online file is easier).

#' \if{html}{
#'   \Sexpr[echo=FALSE, results=rd, stage=build]{
#'     paste0(
#'       "\\\out{", "<img src='",
#'       knitr::image_uri(system.file("gather.gif", package = "hello")),
#'       "' />", "}"
#'     )
#'   }
#' }
jimhester commented 5 years ago

You could probably put it in inst/help/figures and it would always be copied regardless of extension.

yutannihilation commented 5 years ago

You could probably put it in inst/help/figures

Yeah, I thought so too and actually it succeeded, but this dir.create() warns about there's already help/figures.

https://github.com/wch/r-source/blob/41c8d48cf0458921fb5df28d8427d961a61641c1/src/library/tools/R/install.R#L1197

khailper commented 5 years ago

Tests on my local machine give the warning @yutannihilation mentioned, but it seems to install fine. Assuming it passes Travis, should we go with the inst/help/figures approach?

yutannihilation commented 5 years ago

Assuming it passes Travis

Really? I got this warning when I run devtools::check(), so I don't think this can pass CRAN, unfortunately...

* checking package subdirectories ... WARNING
Found the following non-empty subdirectories of ‘inst’ also used by R:
  inst/help
It is recommended not to interfere with package subdirectories used by
R.
khailper commented 5 years ago

I didn't think to run devtools::check; thank you. if{html}{} it is, then.

khailper commented 5 years ago

if{html}{} approach passescheck` without errors, warnings, or notes. However, at least on my machine, the gif isn't displaying the way shown in the Colin Fay post linked above: image

I don't have a good sense if this is something off with my local instnace, or due to changes in RStudio in the past ~1 year.

yutannihilation commented 5 years ago

This worked for me. Did you use <img> tag?

https://github.com/yutannihilation/tidyr/commit/3d33942beb41a974e03fe1792dabb96aea316466

image

khailper commented 5 years ago

I already did. Then I tried copy-pasting your approach in place of mine (mostly using @section instead of @details and not having a width tag). That was just as unsuccessful.

yutannihilation commented 5 years ago

Interesting. Can you share your code or diff so that I can try yours?

For session info, I'm using R 3.5.2 on Windows 10, with RStudio 1.2.1244.

khailper commented 5 years ago

https://github.com/khailper/tidyr/commit/545bd202485c86ea49c7d495e35299da4e99c75c#diff-edf10385d8722b110aaaa83b41f47499

I'm using R 3.5.2 and RStudio 1.1.463 on Windows 10. I'll try updating RStudio to see if that works.

khailper commented 5 years ago

Never mind that idea. You're using Server and I'm using Desktop.

yutannihilation commented 5 years ago

You're using Server

I'm using Desktop (1.2.1244 is a preview version).

Thanks for sharing the commit. It seems correct, and actually it works for me both with the preview version (1.2.1244) and the stable version (1.1.463) of RStudio Desktop. I have no idea why it doesn't work on your environment, sorry... Here's steps I did:

# use dev library
devtools::dev_mode()

# install khailper/tidyr from GitHub
devtools::install_github("khailper/tidyr@545bd20")

# show document
?tidyr::gather
khailper commented 5 years ago

You have nothing to be sorry for. I really appreciate all the help you've offered. I think we're at the point where the best bet is to assume I'm an outlier (but include a link for my fellow weirdos), and declare it done enough.

Edit: To clarify, dev_mode was also unsuccessful.

hadley commented 5 years ago

Thanks for everyone's work on this. Given the difficulty of embedding gifs in documentation files themselves, I think it's probably going to be better to put in a vignette instead. And I'm in the process of redesigning the gather()/spread() interface so it's probably better to invest time documenting the replacements. I'll work with @apreshill and make sure that the next release includes better documentation including animations.