marcovirgolin / gpg

Re-implementation of GP-GOMEA that attempts to be simpler to understand and use than the original.
MIT License
10 stars 1 forks source link
explainable-ai genetic-programming gomea interpretable-machine-learning symbolic-regression

gpg

Re-implementation of GP-GOMEA (Python scikit-learn-compatible interface, C++ backend). This version of the code features only GP-GOMEA and no other algorithms (differently from the previous repo) and focuses on symbolic regression alone. Also, this version uses dependencies that are easier and less finicky to install (see environment.yml).

Installation

Installation requires git and conda. Run the following bash commands from a folder of your choice:

git clone https://github.com/marcovirgolin/gpg.git
cd gpg
conda env create -f environment.yml
conda activate gpg
make

Usage

You can try gpg out with the following code snippet (or simply run try.py if you like):

import numpy as np
from pygpg.sk import GPGRegressor
from pygpg.complexity import compute_complexity
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.model_selection import train_test_split

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

X = np.random.randn(128, 3)*10

def grav_law(X : np.ndarray) -> np.ndarray:
    """Ground-truth function for the gravity law."""
    return 6.67 * X[:,0]*X[:,1]/(np.square(X[:,2])) + np.random.randn(X.shape[0])*0.1 # some noise

y = grav_law(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=RANDOM_SEED)

gpg = GPGRegressor(
  e=50_000,                   # 50,000 evaluations limit
  t=-1,                       # no time limit,
  g=-1,                       # no generation limit,
  d=3,                        # maximum tree depth
  verbose=True,               # print progress
  random_state=RANDOM_SEED,   # for reproducibility
)
gpg.fit(X_train,y_train)

print(
  gpg.model, 
  "(complexity: {})".format(compute_complexity(gpg.model, complexity_metric="node_count")))
print("Train\t\tR2: {}\t\tMSE: {}".format(
  np.round(r2_score(y_train, gpg.predict(X_train)), 3),
  np.round(mean_squared_error(y_train, gpg.predict(X_train)), 3),
))
print("Test\t\tR2: {}\t\tMSE: {}".format(
  np.round(r2_score(y_test, gpg.predict(X_test)), 3),
  np.round(mean_squared_error(y_test, gpg.predict(X_test)), 3),
))

Differences w.r.t. previous version

This version has some differences compared to the code in the previous repo. Here's a list:

Results on SRBench

Running this version on SRBench (gpg) leads to expressions that are as compact but more accurate than those of the original GP-GOMEA, in much less time!

blackbox_results harmonic_means

Research

If you use our code for academic purposes, please support our research by citing:

@article{virgolin2021improving,
  title={Improving model-based genetic programming for symbolic regression of small expressions},
  author={Virgolin, Marco and Alderliesten, Tanja and Witteveen, Cees and Bosman, Peter A. N.},
  journal={Evolutionary Computation},
  volume={29},
  number={2},
  pages={211--237},
  year={2021},
  publisher={MIT Press}
}

Branches