tidymodels / parsnip

A tidy unified interface to models
https://parsnip.tidymodels.org
Other
601 stars 89 forks source link

save x column names from fit_xy() #1168

Closed EmilHvitfeldt closed 2 months ago

EmilHvitfeldt commented 3 months ago

to close #1166.

This bug happened because we don't know the name of the outcome(s) y when using fit_xy() because it is often a nameless vector. So instead, what I tried to do was to save the names of the x and subset with those when appropriate.

This is NOT a xgboost issue. it is just that xgboost complains more loudly than other engines.

library(parsnip)

spec <- boost_tree() %>%
  set_mode("regression") %>%
  set_engine("xgboost")

lm_fit <- fit(spec, mpg ~ ., data = mtcars)

predict(lm_fit, mtcars)
#> # A tibble: 32 × 1
#>    .pred
#>    <dbl>
#>  1  20.9
#>  2  20.9
#>  3  22.6
#>  4  21.0
#>  5  18.4
#>  6  18.1
#>  7  14.2
#>  8  23.7
#>  9  22.4
#> 10  18.9
#> # ℹ 22 more rows

lm_fit <- fit_xy(spec, x = mtcars[, -1], y = mtcars[, 1])

predict(lm_fit, mtcars)
#> # A tibble: 32 × 1
#>    .pred
#>    <dbl>
#>  1  20.9
#>  2  20.9
#>  3  22.6
#>  4  21.0
#>  5  18.4
#>  6  18.1
#>  7  14.2
#>  8  23.7
#>  9  22.4
#> 10  18.9
#> # ℹ 22 more rows

Created on 2024-08-30 with reprex v2.1.0

github-actions[bot] commented 2 months ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.