tidymodels / parsnip

A tidy unified interface to models
https://parsnip.tidymodels.org
Other
595 stars 88 forks source link

`set_engine_args` ? #540

Closed mmp3 closed 1 year ago

mmp3 commented 3 years ago

I am constructing a parsnip-adjacent package that implements a new parsnip model with several engines.

The algorithms underlying each engine for the common new model are quite heterogeneous, to the point that there is no single parameter that is conceptually the same across all the engines, and the number of conceptually similar arguments between any pair of engines is often zero, or otherwise quite small. Yet, they all aim to estimate coefficients for the same equation, so it seems they should be separate engines for one model rather than separate models with one engine each.

The only way forward that I can see is that the main "model" function has no "main arguments", and then all arguments are engine-specific. The downside is that none of the engine-specific arguments can benefit from a constructor from dials in the way that set_model_args takes argument func that can refer to a function based on e.g. dials::new_qual_param. This will make tuning the engine-specific arguments less smooth because the user won't be able to use dials for constructing tuning grids for the engine-specific arguments.

Is there an analog to set_model_args that I am missing - like a set_engine_args?

juliasilge commented 3 years ago

No, that is a main difference between how model and engine arguments are handled.

Do you have a repo you can share for us to take a look at or maybe two example models you are wrapping that you could point out for us to see some specifics of what you mean?

topepo commented 3 years ago

The algorithms underlying each engine for the common new model are quite heterogeneous, to the point that there is no single parameter that is conceptually the same across all the engines, and the number of conceptually similar arguments between any pair of engines is often zero, or otherwise quite small.

I disagree.

A good counter-example is random forest. All three engines have the main arguments that exactly match to the same underlying model parameters.

A slightly less good example is boosted trees. That shares an argument with random_forest(), namely trees. For both models, this is the number of individual models in the ensemble. They are handled differently (one is fits models sequentially over trees and the other does not). From a function API point of view, they can be treated equally.

Similar examples:

I know that there are some between-engine differences between parameters. penalty for glmnet uses a different penalty than the LiblineaR model, but both are doing penalized regression.

If there are places that you feel the main arguments are too different, let us know and we can document this better.

topepo commented 3 years ago

Apologies, I may have misread your point. I thought you were referring to existing models.

Did you mean the models that you are specifically working on?

mmp3 commented 3 years ago

@juliasilge

No, that is a main difference between how model and engine arguments are handled.

OK, thank you.

Do you have a repo you can share for us to take a look at or maybe two example models you are wrapping that you could point out for us to see some specifics of what you mean?

Yes, I just invited you and @topepo to the repo.

mmp3 commented 3 years ago

@topepo

Apologies, I may have misread your point. I thought you were referring to existing models.

Did you mean the models that you are specifically working on?

Yes, I was referring to the new models I am trying to implement. I have invited you and @juliasilge to the repo, as requested.

simonpcouch commented 1 year ago

Looks like this was resolved privately, so I will go ahead and close.

For folks that come across this in the future, it is indeed possible to register engine arguments for built-in dials support!

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.