r-lib / sparsevctrs

Sparse vector class using ALTREP
https://r-lib.github.io/sparsevctrs/
Other
12 stars 1 forks source link

Speed up `coerce_to_sparse_data_frame()` #66

Open EmilHvitfeldt opened 5 months ago

EmilHvitfeldt commented 5 months ago

if this works, we also speed up the tibble case.

In my mind, it shouldn't take this long

library(tidymodels)
library(textrecipes)
library(friends)

preped_rec <- recipe(season ~ text, data = friends) %>%
  step_tokenize(text) %>%
  step_tf(text) %>%
  prep()
#> Warning in asMethod(object): sparse->dense coercion: allocating vector of size
#> 8.7 GiB

term_freq <- bake(preped_rec, new_data = NULL, composition = "dgCMatrix")

library(sparsevctrs)

tictoc::tic()
tmp <- coerce_to_sparse_data_frame(term_freq)
tictoc::toc()
#> 1.392 sec elapsed

Created on 2024-05-23 with reprex v2.1.0