Open asb2111 opened 3 weeks ago
This is an excellent point.
Once a candidate is created by the Gaussian process model, we pass that to tune_grid()
. It fits the new candidate for each resample, makes appropriate predictions, and gets metrics.
We would have to make substantial changes to tune_grid()
to skip the preprocessor. Also, we’d also have to have it save each of the fitted models from the previous fit (assuming that the preprocessor is the same), take that as input, and then start the process when the supervised model is trained (for each resample).
We’ll have to consider this to see if there is a less invasive approach than the one described above (I don't think we can do that).
If there is nothing to tune in the preprocessor, could it be 'baked' prior to starting the tuning process altogether? Then the workflow gets modified to use the baked data and no preprocessor, the tuning is conducted, and then everything gets repackaged at the end? Maybe this is too much work for too little gain in a special case.
Unless you are using a validation set, we would not want to fit the preprocessor on the entire training set then fit the model on a potentially different data set (i.e., one that was a resample)
I'm thinking if we are passing in resamples we could bake the preprocessor on each resample in advance, or something like that.
Ok, I've tried hacking this together and I see why it won't work. The workflow expects the data in each resample to look similar (same columns, etc), but if we preprocess and glue the resamples back together, each resample could have different columns, and that breaks things down the line.
FWIW, I brought all this up because I have a workflow that involves step_lincomb
which takes a very long time to run.
Feature
Currently, it appears that
tune_bayes
recomputes the entire preprocessor during every iteration, even if the preprocessor has nothing to tune. This can lead to a substantial amount of unnecessary computation as the preprocessor should only need to be executed once and could be reused for all iterations.