nathaneastwood / poorman

A poor man's dependency free grammar of data manipulation
https://nathaneastwood.github.io/poorman/
Other
338 stars 15 forks source link

feat: Implement fill() #113

Closed etiennebacher closed 1 year ago

etiennebacher commented 2 years ago

Reproduce tidyr::fill(). I took the main part of the function (the filling algorithm) in a SO answer and I must say I'm not super confident of how it works, but it works and it's fast.

A few tests are missing, for lists for example (see tidyr tests. There's also a weird message with groups (see last example).

Examples:

suppressPackageStartupMessages(library(poorman))

# Value (year) is recorded only when it changes
sales <- data.frame(
  quarter = c(
    "Q1", "Q2", "Q3", "Q4", "Q1", "Q2", "Q3", "Q4", "Q1", "Q2",
    "Q3", "Q4", "Q1", "Q2", "Q3", "Q4"
  ),
  year = c(2000, NA, NA, NA, 2001, NA, NA, NA, 2002, NA, NA, NA, 2004, NA, NA, NA),
  sales = c(
    66013, 69182, 53175, 21001, 46036, 58842, 44568, 50197, 39113, 41668, 30144,
    52897, 32129, 67686, 31768, 49094
  )
)
sales
#>    quarter year sales
#> 1       Q1 2000 66013
#> 2       Q2   NA 69182
#> 3       Q3   NA 53175
#> 4       Q4   NA 21001
#> 5       Q1 2001 46036
#> 6       Q2   NA 58842
#> 7       Q3   NA 44568
#> 8       Q4   NA 50197
#> 9       Q1 2002 39113
#> 10      Q2   NA 41668
#> 11      Q3   NA 30144
#> 12      Q4   NA 52897
#> 13      Q1 2004 32129
#> 14      Q2   NA 67686
#> 15      Q3   NA 31768
#> 16      Q4   NA 49094
sales %>% fill(year)
#>    quarter year sales
#> 1       Q1 2000 66013
#> 2       Q2 2000 69182
#> 3       Q3 2000 53175
#> 4       Q4 2000 21001
#> 5       Q1 2001 46036
#> 6       Q2 2001 58842
#> 7       Q3 2001 44568
#> 8       Q4 2001 50197
#> 9       Q1 2002 39113
#> 10      Q2 2002 41668
#> 11      Q3 2002 30144
#> 12      Q4 2002 52897
#> 13      Q1 2004 32129
#> 14      Q2 2004 67686
#> 15      Q3 2004 31768
#> 16      Q4 2004 49094

# Value (pet_type) is missing above
tidy_pets <- data.frame(
  rank = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L),
  pet_type = c(NA, NA, NA, NA, NA, "Dog", NA, NA, NA, NA, NA, "Cat"),
  breed = c(
    "Boston Terrier", "Retrievers (Labrador)", "Retrievers (Golden)",
    "French Bulldogs", "Bulldogs", "Beagles", "Persian", "Maine Coon",
    "Ragdoll", "Exotic", "Siamese", "American Short"
  )
)
tidy_pets
#>    rank pet_type                 breed
#> 1     1     <NA>        Boston Terrier
#> 2     2     <NA> Retrievers (Labrador)
#> 3     3     <NA>   Retrievers (Golden)
#> 4     4     <NA>       French Bulldogs
#> 5     5     <NA>              Bulldogs
#> 6     6      Dog               Beagles
#> 7     1     <NA>               Persian
#> 8     2     <NA>            Maine Coon
#> 9     3     <NA>               Ragdoll
#> 10    4     <NA>                Exotic
#> 11    5     <NA>               Siamese
#> 12    6      Cat        American Short
tidy_pets %>%
  fill(pet_type, .direction = "up")
#>    rank pet_type                 breed
#> 1     1      Dog        Boston Terrier
#> 2     2      Dog Retrievers (Labrador)
#> 3     3      Dog   Retrievers (Golden)
#> 4     4      Dog       French Bulldogs
#> 5     5      Dog              Bulldogs
#> 6     6      Dog               Beagles
#> 7     1      Cat               Persian
#> 8     2      Cat            Maine Coon
#> 9     3      Cat               Ragdoll
#> 10    4      Cat                Exotic
#> 11    5      Cat               Siamese
#> 12    6      Cat        American Short

# Value (n_squirrels) is missing above and below within a group
squirrels <- data.frame(
  group = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
  name = c(
    "Sam", "Mara", "Jesse", "Tom", "Mike", "Rachael", "Sydekea",
    "Gabriela", "Derrick", "Kara", "Emily", "Danielle"
  ),
  role = c(
    "Observer", "Scorekeeper", "Observer", "Observer", "Observer",
    "Observer", "Scorekeeper", "Observer", "Observer", "Scorekeeper",
    "Observer", "Observer"
  ),
  n_squirrels = c(NA, 8, NA, NA, NA, NA, 14, NA, NA, 9, NA, NA)
)
squirrels
#>    group     name        role n_squirrels
#> 1      1      Sam    Observer          NA
#> 2      1     Mara Scorekeeper           8
#> 3      1    Jesse    Observer          NA
#> 4      1      Tom    Observer          NA
#> 5      2     Mike    Observer          NA
#> 6      2  Rachael    Observer          NA
#> 7      2  Sydekea Scorekeeper          14
#> 8      2 Gabriela    Observer          NA
#> 9      3  Derrick    Observer          NA
#> 10     3     Kara Scorekeeper           9
#> 11     3    Emily    Observer          NA
#> 12     3 Danielle    Observer          NA
squirrels %>%
  group_by(group) %>%
  fill(n_squirrels, .direction = "downup") %>%
  ungroup()
#> Adding missing grouping variables: `group`
#> Adding missing grouping variables: `group`
#> Adding missing grouping variables: `group`
#>    group     name        role n_squirrels
#> 1      1      Sam    Observer           8
#> 2      1     Mara Scorekeeper           8
#> 3      1    Jesse    Observer           8
#> 4      1      Tom    Observer           8
#> 5      2     Mike    Observer          14
#> 6      2  Rachael    Observer          14
#> 7      2  Sydekea Scorekeeper          14
#> 8      2 Gabriela    Observer          14
#> 9      3  Derrick    Observer           9
#> 10     3     Kara Scorekeeper           9
#> 11     3    Emily    Observer           9
#> 12     3 Danielle    Observer           9

Created on 2022-08-26 by the reprex package (v2.0.1)

codecov-commenter commented 2 years ago

Codecov Report

Merging #113 (d3f0898) into master (bbb9aed) will increase coverage by 0.05%. The diff coverage is 96.29%.

@@            Coverage Diff             @@
##           master     #113      +/-   ##
==========================================
+ Coverage   93.53%   93.58%   +0.05%     
==========================================
  Files          58       59       +1     
  Lines        1455     1482      +27     
==========================================
+ Hits         1361     1387      +26     
- Misses         94       95       +1     
Impacted Files Coverage Δ
R/fill.R 96.29% <96.29%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

nathaneastwood commented 2 years ago

I have today returned from holiday. I will try and look at this this evening. Thanks for the hard work.

etiennebacher commented 2 years ago

No problem, take your time (to be honest I forgot this PR)

etiennebacher commented 1 year ago

Small bump (I'm going through my open PRs to see if they can be merged/closed)

etiennebacher commented 1 year ago

No problem ;)