tidymodels / recipes

Pipeable steps for feature engineering and data preprocessing to prepare for modeling
https://recipes.tidymodels.org
Other
574 stars 113 forks source link

Error using an old tidymodels workflow. #1401

Open reisner opened 1 day ago

reisner commented 1 day ago

Hi there,

I'm trying to use a model and recipe that was saved in 2018, and use them with updated packages.

The original model was saved with:

parsnip v0.0.1 recipes v0.1.4

I'm trying to use it with updated packages:

parsnip v1.2.1 recipes v1.1.0

The original model was trained with svm_poly / ksvm.

I've been able to get the model to work (by setting model$elapsed[["elapsed"]] = 1).

However, I'm trying to get the old recipe to work but am hitting this error:

Error in `group_data()`:
! `.data` must be a valid <grouped_df> object.
Caused by error in `validate_grouped_df()`:
! Corrupt `grouped_df` using old (< 0.8.0) format.
ℹ Strip off old grouping with `ungroup()`.

Is there some way I can fix the recipe to work with new package versions? Or is there a way to extract the recipe components and create an updated recipe object? Unfortunately I dont have access to the original training data.

Thanks!

EmilHvitfeldt commented 1 day ago

Hello @reisner 👋

Would you be able to provide a little more information.

  1. When did this start happening? Did this just happen? both {recipes} and {parsnip} haven't had a CRAN release in months.
  2. Are you able to provide a traceback() of the error? this will allow us to better narrow down where the issue is
reisner commented 1 day ago

Hi @EmilHvitfeldt

I dont think this is a recent issue. It's more about trying to load a legacy model with newer versions of the packages. I'm assuming this isnt something you'll be supporting forever, but i was hoping for some guidance on how I can convert the old recipe into new package format.

Here is the relevant part of the traceback:

> traceback()
29: stop(fallback)
28: signal_abort(cnd, .file)
27: abort(msg, parent = cnd, call = error_call)
26: (function (cnd) 
    {
        msg <- glue("`.data` must be a valid <grouped_df> object.")
        abort(msg, parent = cnd, call = error_call)
    })(structure(list(message = structure("Corrupt `grouped_df` using old (< 0.8.0) format.", names = ""), 
        trace = structure(list(call = list(source("train_model_and_predict.R"), 
            withVisible(eval(ei, envir)), eval(ei, envir), eval(ei, 
                envir), main(), run_prediction(model$model, training_df, 
                model$recipe), bake(trained_recipe, new_data = testing_data), 
            bake.recipe(trained_recipe, new_data = testing_data), 
            recipes_eval_select(terms, new_data, info, check_case_weights = FALSE), 
            vec_slice(info, matches$haystack), `<fn>`(), vec_restore_dispatch(x = x, 
                to = to), vec_restore.grouped_df(x = x, to = to), 
            group_intersect(to, x), intersect(dplyr::group_vars(x), 
                names(new)), dplyr::group_vars(x), group_vars.data.frame(x), 
            setdiff(names(group_data(x)), ".rows"), group_data(x), 
            group_data.grouped_df(x), withCallingHandlers(validate_grouped_df(.data), 
                error = function(cnd) {
                    msg <- glue("`.data` must be a valid <grouped_df> object.")
                    abort(msg, parent = cnd, call = error_call)
                }), validate_grouped_df(.data), abort(bullets)), 
            parent = c(0L, 1L, 1L, 3L, 0L, 5L, 6L, 6L, 8L, 9L, 10L, 
            11L, 11L, 13L, 14L, 14L, 14L, 17L, 17L, 17L, 20L, 20L, 
            22L), visible = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
            TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
            TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE), namespace = c("base", 
            "base", "base", "base", NA, NA, "recipes", "recipes", 
            "recipes", "vctrs", "vctrs", "vctrs", "vctrs", "vctrs", 
            "base", "dplyr", "dplyr", "generics", "dplyr", "dplyr", 
            "base", "dplyr", "rlang"), scope = c("::", "::", "::", 
            "::", "global", "global", "::", ":::", "::", "::", "local", 
            ":::", "local", ":::", "::", "::", ":::", "::", "::", 
            ":::", "::", "::", "::"), error_frame = c(FALSE, FALSE, 
            FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
            FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
            FALSE, FALSE, FALSE, TRUE, FALSE)), row.names = c(NA, 
        -23L), version = 2L, class = c("rlang_trace", "rlib_trace", 
        "tbl", "data.frame")), parent = NULL, body = c(i = "Strip off old grouping with `ungroup()`."), 
        rlang = list(inherit = TRUE), call = validate_grouped_df(.data), 
        use_cli_format = TRUE), class = c("rlang_error", "error", 
    "condition")))
25: signalCondition(cnd)
24: signal_abort(cnd, .file)
23: abort(bullets)
22: validate_grouped_df(.data)
21: withCallingHandlers(validate_grouped_df(.data), error = function(cnd) {
        msg <- glue("`.data` must be a valid <grouped_df> object.")
        abort(msg, parent = cnd, call = error_call)
    })
20: group_data.grouped_df(x)
19: group_data(x)
18: setdiff(names(group_data(x)), ".rows")
17: group_vars.data.frame(x)
16: dplyr::group_vars(x)
15: intersect(dplyr::group_vars(x), names(new))
14: group_intersect(to, x)
13: vec_restore.grouped_df(x = x, to = to)
12: vec_restore_dispatch(x = x, to = to)
11: (function () 
    vec_restore_dispatch(x = x, to = to))()
10: vec_slice(info, matches$haystack)
9: recipes_eval_select(terms, new_data, info, check_case_weights = FALSE)
8: bake.recipe(trained_recipe, new_data = testing_data)
7: bake(trained_recipe, new_data = testing_data) at analysis_functions.R#222
6: run_prediction(model$model, training_df, model$recipe) at train_model_and_predict.R#216