heal-research / pyoperon

Python bindings and scikit-learn interface for the Operon library for symbolic regression.
MIT License
34 stars 10 forks source link

Calling the optimizer is unnecessarily complicated #10

Open foolnotion opened 5 months ago

foolnotion commented 5 months ago

Right now, optimizing a tree with pyoperon requires a lot of ugly code:

def evaluate_with_pyoperon(pdata, tree, range_train, range_test):
    a, b = range_train
    c, d = range_test

    # pyoperon
    pyop_dataset  = op.Dataset(pdata.values)
    pyop_dataset.VariableNames = pdata.columns
    pyop_range_tr = op.Range(a, b)
    pyop_range_te = op.Range(c, d)
    pyop_vars     = sorted(pyop_dataset.Variables, key=lambda v: v.Index)
    pyop_hashes   = [v.Hash for v in pyop_vars[:-1]]
    pyop_target   = pyop_vars[-1]
    pyop_problem  = op.Problem(pyop_dataset, pyop_range_tr, pyop_range_te)
    pyop_problem.InputHashes = pyop_hashes
    pyop_problem.Target = pyop_target
    pyop_dt       = op.DispatchTable()
    pyop_opt      = op.LMOptimizer(pyop_dt, pyop_problem, max_iter=20)
    rng = op.RomuTrio(np.random.randint(1, 1_000_000))

    summary = pyop_opt.Optimize(rng, tree)

    if summary.Success:
        pyop_tree.SetCoefficients(summary.FinalParameters)

    range_full = op.Range(0, pyop_dataset.Rows)
    return op.Evaluate(pyop_dt, pyop_tree, pyop_dataset, range_full)

This should not be so complicated. At the very least, if pdata is a dataframe, we should hide the construction of the dataset and problem and offer a simplified API.