Closed jeet-parikh closed 1 year ago
Hey @jeet-parikh, looks good!
A few things to touch up.
Use OneHotEncoder to create new columns when there are multiple options, rather than doing an ordinal encoding.
The objectives shouldn't be mixed into the features (otherwise, it's giving the model the answer as one of the columns). The rank variable should be added for each of the regressors.
See https://github.com/sparks-baird/matsci-opt-benchmarks/blob/main/notebooks/particle_packing/1.2-ri-surrogate.ipynb for an updated way of handling the cross validation (using GroupKFold
).
Later, when making the group_array
-s, you can use something like the following:
sobol_reg_fba_group = (
sobol_reg_fba[fba_features]
.round(6)
.apply(lambda row: "_".join(row.values.astype(str)), axis=1)
)
instead of:
so that it's a bit less verbose.
The point of using GroupKFold
is to prevent data leakage (in this case, where the repeat runs would get mixed between the training and test sets).
Pull Request Test Coverage Report for Build 4300208892
💛 - Coveralls