completed 1.2 notebook, created initial surrogate models, updated graphs

sparks-baird / matsci-opt-benchmarks

A collection of benchmarking problems and datasets for testing the performance of advanced optimization algorithms in the field of materials science and chemistry.

MIT License

10 stars 1 forks source link

Pull Request Test Coverage Report for Build 4300208892

0 of 0 changed or added relevant lines in 0 files are covered.

113 unchanged lines in 3 files lost coverage.

Overall coverage increased (+0.04%) to 11.364%

Files with Coverage Reduction	New Missed Lines	%
src/matsci_opt_benchmarks/particle_packing/utils/packing_generation.py	7	67.32%
src/matsci_opt_benchmarks/particle_packing/core.py	20	0%
src/matsci_opt_benchmarks/particle_packing/utils/ax.py	86	0%
<!--	Total:	113	-->

Files with Coverage Reduction

New Missed Lines

src/matsci_opt_benchmarks/particle_packing/utils/packing_generation.py

67.32%

src/matsci_opt_benchmarks/particle_packing/core.py

src/matsci_opt_benchmarks/particle_packing/utils/ax.py

<!--

Total:

113

-->

Totals
Change from base Build 4214506172:	0.04%
Covered Lines:	89
Relevant Lines:	665

Totals

Change from base Build 4214506172:

0.04%

Covered Lines:

Relevant Lines:

665

Hey @jeet-parikh, looks good!

A few things to touch up.

Use OneHotEncoder to create new columns when there are multiple options, rather than doing an ordinal encoding.

The objectives shouldn't be mixed into the features (otherwise, it's giving the model the answer as one of the columns). The rank variable should be added for each of the regressors.

See https://github.com/sparks-baird/matsci-opt-benchmarks/blob/main/notebooks/particle_packing/1.2-ri-surrogate.ipynb for an updated way of handling the cross validation (using GroupKFold).

Later, when making the group_array-s, you can use something like the following:

sobol_reg_fba_group = (
        sobol_reg_fba[fba_features]
        .round(6)
        .apply(lambda row: "_".join(row.values.astype(str)), axis=1)
    )

instead of:

so that it's a bit less verbose.

The point of using GroupKFold is to prevent data leakage (in this case, where the repeat runs would get mixed between the training and test sets).

sparks-baird / matsci-opt-benchmarks

completed 1.2 notebook, created initial surrogate models, updated graphs #21

Pull Request Test Coverage Report for Build 4300208892

💛 - Coveralls