luifrancgom / marketing_research_and_analytics

Repository associated to the course Marketing research and analytics
5 stars 0 forks source link

What is the difference between mutate and summarise? #6

Closed luifrancgom closed 2 months ago

luifrancgom commented 5 months ago

Describe the problem in general that you are having with R or Quarto

I don't understand the difference between mutate and summarise

Point out a Minimal Reproducible Example (MRE) in relation to the problem you are having

Doesn't apply I need to understand the diffetence between mutate and summarise

Additional context

Please explain it with code and in a graphical way

luifrancgom commented 5 months ago

Graphical explanation

Screenshot 2024-05-01 213327

Screenshot 2024-05-01 213628

Screenshot 2024-05-01 214817

Using code

# Libraries ----
library(tidyverse)

# Toy data ----
data <- tibble(consumer_id = 1:5,
               segment     = c("A", "A", "B", "A", "B"),
               income      = 10:14)

# Without grouping ----
# Assumme we want to calculate the mean income

## mutate without grouping ----
data |> 
  # The mean income is calculated but a value for each row is created
  mutate(mean = mean(income))
#> # A tibble: 5 × 4
#>   consumer_id segment income  mean
#>         <int> <chr>    <int> <dbl>
#> 1           1 A           10    12
#> 2           2 A           11    12
#> 3           3 B           12    12
#> 4           4 A           13    12
#> 5           5 B           14    12

## summarise without grouping ----
data |>
  # The mean income is calculated but only a row is created
  summarise(mean = mean(income))
#> # A tibble: 1 × 1
#>    mean
#>   <dbl>
#> 1    12

# Grouping ----
# Assume we want to calculate the mean income by segment

## mutate with grouping ----
data |> 
  group_by(segment) |> 
  # The mean income is calculated for each segment but a value for 
  # each row is created
  mutate(mean = mean(income))
#> # A tibble: 5 × 4
#> # Groups:   segment [2]
#>   consumer_id segment income  mean
#>         <int> <chr>    <int> <dbl>
#> 1           1 A           10  11.3
#> 2           2 A           11  11.3
#> 3           3 B           12  13  
#> 4           4 A           13  11.3
#> 5           5 B           14  13

## summarise without grouping ----
data |> 
  group_by(segment) |> 
  # The mean income is calculated for each
  # segment but a value for each group is 
  # created (not for each row but for each
  # group)
  summarise(mean = mean(income))
#> # A tibble: 2 × 2
#>   segment  mean
#>   <chr>   <dbl>
#> 1 A        11.3
#> 2 B        13

Created on 2024-05-01 with reprex v2.1.0

luifrancgom commented 2 months ago

The question was solve. Closing issue.