Closed lucusmueller closed 12 months ago
You're possibly using a different version of dplyr? See reprex below that also includes session info:
library(tidyverse)
library(nycflights13)
flights |>
mutate(
gain = dep_delay - arr_delay,
speed = distance / air_time * 60
)
#> # A tibble: 336,776 × 21
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#> <int> <int> <int> <int> <int> <dbl> <int> <int>
#> 1 2013 1 1 517 515 2 830 819
#> 2 2013 1 1 533 529 4 850 830
#> 3 2013 1 1 542 540 2 923 850
#> 4 2013 1 1 544 545 -1 1004 1022
#> 5 2013 1 1 554 600 -6 812 837
#> 6 2013 1 1 554 558 -4 740 728
#> 7 2013 1 1 555 600 -5 913 854
#> 8 2013 1 1 557 600 -3 709 723
#> 9 2013 1 1 557 600 -3 838 846
#> 10 2013 1 1 558 600 -2 753 745
#> # ℹ 336,766 more rows
#> # ℹ 13 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#> # hour <dbl>, minute <dbl>, time_hour <dttm>, gain <dbl>, speed <dbl>
flights |>
mutate(
gain = dep_delay - arr_delay,
speed = distance / air_time * 60,
.before = 1
)
#> # A tibble: 336,776 × 21
#> gain speed year month day dep_time sched_dep_time dep_delay arr_time
#> <dbl> <dbl> <int> <int> <int> <int> <int> <dbl> <int>
#> 1 -9 370. 2013 1 1 517 515 2 830
#> 2 -16 374. 2013 1 1 533 529 4 850
#> 3 -31 408. 2013 1 1 542 540 2 923
#> 4 17 517. 2013 1 1 544 545 -1 1004
#> 5 19 394. 2013 1 1 554 600 -6 812
#> 6 -16 288. 2013 1 1 554 558 -4 740
#> 7 -24 404. 2013 1 1 555 600 -5 913
#> 8 11 259. 2013 1 1 557 600 -3 709
#> 9 5 405. 2013 1 1 557 600 -3 838
#> 10 -10 319. 2013 1 1 558 600 -2 753
#> # ℹ 336,766 more rows
#> # ℹ 12 more variables: sched_arr_time <int>, arr_delay <dbl>, carrier <chr>,
#> # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
flights |>
mutate(
gain = dep_delay - arr_delay,
speed = distance / air_time * 60,
.after = day
)
#> # A tibble: 336,776 × 21
#> year month day gain speed dep_time sched_dep_time dep_delay arr_time
#> <int> <int> <int> <dbl> <dbl> <int> <int> <dbl> <int>
#> 1 2013 1 1 -9 370. 517 515 2 830
#> 2 2013 1 1 -16 374. 533 529 4 850
#> 3 2013 1 1 -31 408. 542 540 2 923
#> 4 2013 1 1 17 517. 544 545 -1 1004
#> 5 2013 1 1 19 394. 554 600 -6 812
#> 6 2013 1 1 -16 288. 554 558 -4 740
#> 7 2013 1 1 -24 404. 555 600 -5 913
#> 8 2013 1 1 11 259. 557 600 -3 709
#> 9 2013 1 1 5 405. 557 600 -3 838
#> 10 2013 1 1 -10 319. 558 600 -2 753
#> # ℹ 336,766 more rows
#> # ℹ 12 more variables: sched_arr_time <int>, arr_delay <dbl>, carrier <chr>,
#> # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Created on 2023-11-09 with reprex v2.0.2
Trying to reproduce examples from the book 'R for Data Science' (2nd ed), chapter 'Data Transformation', section headed 'mutate()'
The two new columns gain & speed are invariably added to the right end, despite the arguments .before=1 and .after=day in the latter two calls, contrary to the examples presented in the book. Rather, the arguments are interpreted as new variables to be created, named .before resp. .after, and coded 1 invariably.
By contrast, the arguments .before/.after worked well when I used them in relocate() functions, as suggested by their use of "tidy selection, as opposed to "data masking" used by mutate() [see https://dplyr.tidyverse.org/articles/programming.html, which I might have to read five more times in order to perhaps know what to do :) ].
=> Why does my R not reproduce the results presented in the book, despite using the identical code?