tidymodels / planning

Documents to plan and discuss future development
MIT License
37 stars 4 forks source link

Sparse tibble support #29

Open EmilHvitfeldt opened 2 months ago

EmilHvitfeldt commented 2 months ago

This will serve as the main hub of issues across the tidymodels ecosystem, regarding the implementation of sparse data in tibbles.

right now we are still in the exploratory phase, with work happening in https://github.com/EmilHvitfeldt/sparsevctrs to implement sparse vector classes that can be used within a tibble.

another thing we can do with this framework is allow sparse data as inputs to functions such as vfold_cv(), fit(), predict() etc etc, turning sparse data into sparse tibbles

EmilHvitfeldt commented 2 months ago

Steps in {recipes} according to whether they can use sparse vectors

Produce sparsity

Modify sparsity

For sure

Might work out of the box

I think

Unaffected steps

EmilHvitfeldt commented 2 months ago

{themis} doesn't have any methods that apply.

{embed} only has step_feature_hash(), but it is soft deprecated so I don't think it is worth it.

EmilHvitfeldt commented 2 months ago

{textrecipes} has the following steps that produce sparsity

The remaining are unaffected