Open eitsupi opened 2 months ago
Indeed neopolars is slower, but it seems it's not that slow on my Windows. Both polars and neopolars are freshly installed from GitHub by pak::pkg_install()
.
long_vec_1 <- 1:10^6
bench::mark(
polars = {
polars::as_polars_series(long_vec_1)
},
neopolars = {
neopolars::as_polars_series(long_vec_1)
},
check = FALSE,
min_iterations = 5
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 polars 196.1µs 1.11ms 833. 10.11MB 2.01
#> 2 neopolars 3.13ms 6.43ms 149. 1.03MB 0
polars_series_1 <- polars::as_polars_series(long_vec_1)
neopolars_series_1 <- neopolars::as_polars_series(long_vec_1)
bench::mark(
polars = {
as.vector(polars_series_1)
},
neopolars = {
as.vector(neopolars_series_1)
},
check = TRUE,
min_iterations = 5
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 polars 4.08ms 4.62ms 186. 5.85MB 22.0
#> 2 neopolars 7.18ms 8.53ms 117. 4.56MB 22.5
Created on 2024-09-06 with reprex v2.1.1
Thanks for taking a look at this! Perhaps the difference in my benchmark result could have been spread by different optimizations at build time in my installation process......
But even your results seem to show a difference of about 5x in construction and 2x in export, so I am wondering where the difference comes from.
This repository is for checking if savvy is sufficiently fast, not for competing with extendr. I think a few ms is fast enough. Let's worry about the performance when we hit a problem with more real usages.
One possible factor that might affect such a benchmark is that savvy always expands ALTREP vectors.
In the code above, an ALTREP is created only once, so this shouldn't affect. But, future benchmark might show some bottleneck related to this.
# Construct an Arrow array from an R vector
long_vec_1 <- 1:10^6
Wouldn't it be desirable for both projects to have a comparison benchmark? So we all could know if any update would result in a performance regression. I remember doing a simple one when switching to savvy and IIRC savvy was only a bit slower, nothing to worry about IMO...
I am not sure if this stems from the difference between extendr and savvy, so apologies if this is completely unrelated.
When comparing the already existing polars binding using extendr (
polars
) to the rewritten polars binding using savvy (neopolars
), I noticed that the latter was orders of magnitude slower on both vector inputs and outputs.https://github.com/pola-rs/r-polars/issues/1079#issuecomment-2331577275
Created on 2024-09-05 with reprex v2.1.1
If you could give me some advice on how to improve the performance in any way I would appreciate it.