mine-cetinkaya-rundel / r4ds-solutions

https://mine-cetinkaya-rundel.github.io/r4ds-solutions/
20 stars 27 forks source link

3/4.5.7 Exercise 2 #31

Open maxwellely opened 10 months ago

maxwellely commented 10 months ago

The question asked in the book is "Find the flights that are most delayed upon departure from each destination.

The answer offered in the solutions book is -

> flights |> 
+   group_by(dest) |> 
+   arrange(dest, desc(dep_delay)) |> 
+   slice_head(n = 5) |> 
+   relocate(dest,dep_delay)

However, this code doesn't output the flights that are most delayed upon departure from each destination, but instead the flights that are overall the most delayed.

# A tibble: 517 × 19
# Groups:   dest [105]
   dest  dep_delay  year month   day dep_time sched_dep_time arr_time sched_arr_time arr_delay carrier flight tailnum
   <chr>     <dbl> <int> <int> <int>    <int>          <int>    <int>          <int>     <dbl> <chr>    <int> <chr>  
 1 ABQ         142  2013    12    14     2223           2001      133           2304       149 B6          65 N659JB 
 2 ABQ         139  2013    12    17     2220           2001      120           2304       136 B6          65 N556JB 
 3 ABQ         125  2013     7    30     2212           2007       57           2259       118 B6        1505 N621JB 
 4 ABQ         125  2013     9     2     2212           2007       48           2259       109 B6        1505 N569JB 
 5 ABQ         119  2013     7    23     2206           2007      116           2259       137 B6        1505 N589JB 
 6 ACK         219  2013     7    23     1139            800     1250            909       221 B6        1491 N192JB 
 7 ACK         138  2013     7     2     1018            800     1119            909       130 B6        1491 N231JB 
 8 ACK         117  2013     7     4      957            800     1106            909       117 B6        1491 N306JB 
 9 ACK         101  2013     5    30     1321           1140     1419           1247        92 B6        1191 N203JB 
10 ACK         100  2013     6    24      940            800     1111            909       122 B6        1491 N348JB 
# ℹ 507 more rows
# ℹ 6 more variables: origin <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
# ℹ Use `print(n = ...)` to see more rows

I could be wrong, but I believe this is the code needed to show the flights that are most delayed upon departure from each destination.

> flights |> 
+   group_by(dest) |> 
+   slice_max(dep_delay) |> 
+   arrange(desc(dep_delay)) |> 
+   relocate(dest,dep_delay)

This code outputs one row for each destination. I used arrange, but I don't even know if it is strictly necessary with what the question is asking.

# A tibble: 105 × 19
# Groups:   dest [105]
   dest  dep_delay  year month   day dep_time sched_dep_time arr_time sched_arr_time arr_delay carrier flight tailnum
   <chr>     <dbl> <int> <int> <int>    <int>          <int>    <int>          <int>     <dbl> <chr>    <int> <chr>  
 1 HNL        1301  2013     1     9      641            900     1242           1530      1272 HA          51 N384HA 
 2 CMH        1137  2013     6    15     1432           1935     1607           2120      1127 MQ        3535 N504MQ 
 3 ORD        1126  2013     1    10     1121           1635     1239           1810      1109 MQ        3695 N517MQ 
 4 SFO        1014  2013     9    20     1139           1845     1457           2210      1007 AA         177 N338AA 
 5 CVG        1005  2013     7    22      845           1600     1044           1815       989 MQ        3075 N665MQ 
 6 TPA         960  2013     4    10     1100           1900     1342           2211       931 DL        2391 N959DL 
 7 MSP         911  2013     3    17     2321            810      135           1020       915 DL        2119 N927DA 
 8 PDX         899  2013     6    27      959           1900     1236           2226       850 DL        2007 N3762Y 
 9 ATL         898  2013     7    22     2257            759      121           1026       895 DL        2047 N6716C 
10 MIA         896  2013    12     5      756           1700     1058           2020       878 AA         172 N5DMAA 
# ℹ 95 more rows
# ℹ 6 more variables: origin <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
# ℹ Use `print(n = ...)` to see more rows

If I misunderstood the question or the answer, please let me know. I am using this book to learn, so it is possible I don't know something the author of this solution does.

Thanks!

clarelgibson commented 1 week ago

I agree with @maxwellely 's solution.