tidyverse / tibble

A modern re-imagining of the data frame
https://tibble.tidyverse.org/
Other
663 stars 131 forks source link

preferred tibble assignment? #334

Closed pgensler closed 6 years ago

pgensler commented 6 years ago

Not sure if this makes anything crash, but it might be worth noting how assignment should work in tibbles:

#should I be using assignment inside a tibble?
df <-tibble::tibble(
  a <- as.POSIXct("25072013", format = "%d%m%Y"),
  b <-c(1)
)

df <-tibble::tibble(
  a = as.POSIXct("25072013", format = "%d%m%Y"),
  b = c(1)
)

#heck this works......
df <-tibble::tibble(
  a <<- as.POSIXct("25072013", format = "%d%m%Y"),
  b <<-c(1)
)
str(df)

I thought it was bad to use the = for assignment, but personally I think it's cleaner to read inside creating a tibble. It might be worth noting the preferred way of doing this, as I could imagine that this may wreak havoc with base r.

pgensler commented 6 years ago

It might also be worth stating in the vignette or in the readme, that you cannot mix formula assignment with vector creation like this:

> sample_data <- tibble::tibble(
+   id = c(390639,99472,361258),
+   ~value,
+   "RATINGS: 4   MEAN: 3.83/5.0   WEIGHTED AVG: 3.39/5   IBU: 35   EST. CALORIES: 204   ABV: 6.8%",
+   "RATINGS: 89   WEIGHTED AVG: 3.64/5   EST. CALORIES: 188   ABV: 6.25%",
+   "RATINGS: 8   MEAN: 3.7/5.0   WEIGHTED AVG: 3.45/5   IBU: 85   EST. CALORIES: 213   ABV: 7.1%"
+ )
Error: Column `~value` must be a 1d atomic vector or a list
> 
> #this does work though. notation is consistent throughout
> sample_data <- tibble::tribble(
+   ~id,  ~value,
+   390639, "RATINGS: 4   MEAN: 3.83/5.0   WEIGHTED AVG: 3.39/5   IBU: 35   EST. CALORIES: 204   ABV: 6.8%",
+   99472, "RATINGS: 89   WEIGHTED AVG: 3.64/5   EST. CALORIES: 188   ABV: 6.25%",
+   361258, "RATINGS: 8   MEAN: 3.7/5.0   WEIGHTED AVG: 3.45/5   IBU: 85   EST. CALORIES: 213   ABV: 7.1%"
+ )
krlmlr commented 6 years ago

The three tibbles created by the code have different column names, see the second row of each output:

tibble::tibble(
  a <- as.POSIXct("25072013", format = "%d%m%Y"),
  b <-c(1)
)
#> # A tibble: 1 x 2
#>   `a <- as.POSIXct("25072013", format = "%d%m%Y")` `b <- c(1)`
#>   <dttm>                                                 <dbl>
#> 1 2013-07-25 00:00:00                                     1.00

tibble::tibble(
  a = as.POSIXct("25072013", format = "%d%m%Y"),
  b = c(1)
)
#> # A tibble: 1 x 2
#>   a                       b
#>   <dttm>              <dbl>
#> 1 2013-07-25 00:00:00  1.00

tibble::tibble(
  a <<- as.POSIXct("25072013", format = "%d%m%Y"),
  b <<-c(1)
)
#> # A tibble: 1 x 2
#>   `a <<- as.POSIXct("25072013", format = "%d%m%Y")` `b <<- c(1)`
#>   <dttm>                                                   <dbl>
#> 1 2013-07-25 00:00:00                                       1.00

The column name is picked up only when using =, which in this context means "passing arguments by name", not "assignment". tibble() is an R function that interprets these arguments in a special way: It returns a tibble with a column created from each of the arguments.

To sum up: Use <- for assignment, use = to give a name to arguments in a function call.

Passing arguments by name works for all other functions in R, not just tibble(), and using = is the only way to do it.

Happy to review a pull request that improves the vignette.

*: Barring exceptions: some primitive functions; tidy evaluation.

pgensler commented 6 years ago

Is this really allowed though to use = and <<- to create columns for tibbles? Maybe this is just me, but I think that seems a bit overkill, (unless you needed to for rlang and tidyeval). If = is the only way, wouldn't you want to have some error or warning when using the <<- at the very least?

krlmlr commented 6 years ago

Raising a warning does sound like a good idea, but might break existing code.

@lionel-: Do you think rlang should warn for usages like tibble(a <- 1:3) or tibble(a <<- 1:3)?

lionel- commented 6 years ago

hmmm..... probably not because there are legitimate usages, i.e. in a dplyr::do() block. I know users are relying on this because of bug reports when we ported dplyr to tidyeval. We could allow it only in a block but that would complicate the implementation. cc @hadley

lionel- commented 6 years ago

Actually just disallowing <- as top level argument wouldn't complicate the implementation so it's feasible. But if we handle this at capture we'll get a warning or an error when doing things like quo(a <- b) which doesn't seem right. The issue is that tidy eval is used for general purpose programming on the language in addition to creating quoting UIs. Maybe that could be an option to enquo()? But I wonder if this'd be worth the complication.

hadley commented 6 years ago

I think special casing this behaviour is more likely to be more trouble than its worth. Unfortunately we can’t protect the user against every potential mistake and combining = with <- is a fairly common R idiom.

krlmlr commented 6 years ago

Thanks. Let's leave it at that.

github-actions[bot] commented 3 years ago

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.