juliasilge / widyr

Widen, process, and re-tidy a dataset
http://juliasilge.github.io/widyr/
Other
327 stars 29 forks source link

Lazy evaluation issue in `widely()` #44

Closed lhdjung closed 1 year ago

lhdjung commented 1 year ago

I really like this package, but I think widely() has the very minor issue described in Advanced R ch. 10.2.3, Forcing evaluation:

library(widyr)
library(gapminder)

sort <- FALSE
dist_widely <- widely(dist, sort = sort)
sort <- TRUE
dist_widely(gapminder, country, year, lifeExp)
#> # A tibble: 10,011 × 3
#>    item1        item2        value
#>    <fct>        <fct>        <dbl>
#>  1 Iceland      Sierra Leone  138.
#>  2 Sierra Leone Sweden        137.
#>  3 Norway       Sierra Leone  135.
#>  4 Afghanistan  Iceland       135.
#>  5 Netherlands  Sierra Leone  135.
#>  6 Sierra Leone Switzerland   134.
#>  7 Afghanistan  Sweden        134.
#>  8 Angola       Iceland       134.
#>  9 Afghanistan  Norway        133.
#> 10 Angola       Sweden        133.
#> # … with 10,001 more rows

# The rows are sorted by `value` because I
# changed the binding of `sort` after assigning
# the adverb's output to `dist_widely`.
# They would not be sorted without this change:
sort <- FALSE
dist_widely <- widely(dist, sort = sort)
dist_widely(gapminder, country, year, lifeExp)
#> # A tibble: 10,011 × 3
#>    item1       item2       value
#>    <fct>       <fct>       <dbl>
#>  1 Afghanistan Albania    107.  
#>  2 Afghanistan Algeria     76.8 
#>  3 Afghanistan Angola       4.65
#>  4 Afghanistan Argentina  110.  
#>  5 Afghanistan Australia  129.  
#>  6 Afghanistan Austria    124.  
#>  7 Afghanistan Bahrain     98.1 
#>  8 Afghanistan Bangladesh  45.3 
#>  9 Afghanistan Belgium    125.  
#> 10 Afghanistan Benin       39.3 
#> # … with 10,001 more rows

Created on 2022-11-04 with reprex v2.0.2

The easiest solution would be to insert force() calls into widely()'s function body, as below. Another possibility is rewriting widely() using factory::build_factory().

widely <- function(.f, sort = FALSE, sparse = FALSE, maximum_size = 1e+07) {
    force(.f)
    force(sort)
    force(sparse)
    force(maximum_size)
    function(tbl, row, column, value, ...) {
      # ...
    }
}

(Edit: I previously made a mistake in the code below but the result is the same after correction.)

With squarely(), though, everything looks fine:

library(widyr)
library(gapminder)
library(waldo)
suppressPackageStartupMessages(library(dplyr))

upper <- FALSE
dist_squarely1 <- squarely(dist, upper = upper)
upper <- TRUE

out1 <- gapminder %>%
    group_by(continent) %>%
    dist_squarely1(country, year, lifeExp)

upper <- FALSE
dist_squarely2 <- squarely(dist)

out2 <- gapminder %>%
    group_by(continent) %>%
    dist_squarely2(country, year, lifeExp)

compare(out1, out2)
#> ✔ No differences

Created on 2022-11-04 with reprex v2.0.2

juliasilge commented 1 year ago

Yes, you are correct! Are you interested in submitting a PR to add force(), and maybe add a test? I think using force() is a better move for this package than taking on that dependency.

lhdjung commented 1 year ago

Sure! I hope my PR is adequate.

juliasilge commented 1 year ago

Closed by #45