Closed tanho63 closed 2 years ago
Hmm it looks like you're right.
There were two things that I thought might cause this.
1) Your data size was small. But on a larger dataset the performance is still better.
2) direction = "downup"
was pulling the data back into R an extra time compared to the vctrs
code which handled it internally. But even just with direction = "down"
the performance of vec_fill_missing()
is better. (And it's much faster with type = "downup"
)
library(vctrs)
fill_na <- tidytable:::fill_na
data_size <- 10000000
vec <- sample(c(NA,2,3,NA,5), data_size, TRUE)
bench::mark(
fill_na = fill_na(vec, direction = "downup"),
vctrs = vec_fill_missing(vec, direction = "downup")
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 fill_na 67.1ms 67.1ms 14.9 153MB 89.4
#> 2 vctrs 25.9ms 27.1ms 36.2 153MB 42.2
bench::mark(
fill_na = fill_na(vec, direction = "down"),
vctrs = vec_fill_missing(vec, direction = "down")
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 fill_na 26.4ms 29.5ms 31.2 76.3MB 27.3
#> 2 vctrs 25.8ms 28ms 35.7 152.6MB 61.7
Updated - thanks for catching this 😄
No problemo! I was trying to optimize something with data.table and was very confused about why nafill only handles numerics...so then I got into testing and comparing against your solution and was like, wait,,,,,,,
For
fill()
: https://github.com/markfairbanks/tidytable/blob/91d2b26e6d11d7c70eefefe3f206ebfc77b12c04/R/fill.R#L65-L85I think you could see a nice speed increase by simplifying to vec_fill_missing rather than the wrapper on
data.table::nafill
: