Closed ajawfi closed 4 years ago
Which version of TF are you using?
On 10/12/19 7:22 PM, ajawfi wrote:
I get the following error when trying to run the package, it seems that the version of tensorflow used in the package is an older version:
File "modules\karoo_gp_base_class.py", line 59, in
'log': tf.log, # e.g., log(a) AttributeError: module 'tensorflow' has no attribute 'log'
I’m using the latest version of TF, version 2.0.
On Oct 13, 2019, at 9:08 AM, Kai Staats notifications@github.com wrote:
Which version of TF are you using?
On 10/12/19 7:22 PM, ajawfi wrote:
I get the following error when trying to run the package, it seems that the version of tensorflow used in the package is an older version:
File "modules\karoo_gp_base_class.py", line 59, in
'log': tf.log, # e.g., log(a) AttributeError: module 'tensorflow' has no attribute 'log' — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Ok. Looks like the means by which we define our maths functions are no longer supported, or have changed. Should be easy to fix. We will look into this ASAP ... stay tuned.
On 10/13/19 9:13 AM, ajawfi wrote:
I’m using the latest version of TF, version 2.0.
On Oct 13, 2019, at 9:08 AM, Kai Staats notifications@github.com wrote:
Which version of TF are you using?
On 10/12/19 7:22 PM, ajawfi wrote:
I get the following error when trying to run the package, it seems that the version of tensorflow used in the package is an older version:
File "modules\karoo_gp_base_class.py", line 59, in
'log': tf.log, # e.g., log(a) AttributeError: module 'tensorflow' has no attribute 'log' — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Thanks, I actually already fixed for me. I just wanted to let you know. As you said, you just need to change couple of the TF attributes (I think I changed around 5 attributes to get it working). Great package by the way. Keep up the good work !
Sent from my iPhone
On Oct 13, 2019, at 10:21 AM, Kai Staats notifications@github.com wrote:
Ok. Looks like the means by which we define our maths functions are no longer supported, or have changed. Should be easy to fix. We will look into this ASAP ... stay tuned.
On 10/13/19 9:13 AM, ajawfi wrote:
I’m using the latest version of TF, version 2.0.
On Oct 13, 2019, at 9:08 AM, Kai Staats notifications@github.com wrote:
Which version of TF are you using?
On 10/12/19 7:22 PM, ajawfi wrote:
I get the following error when trying to run the package, it seems that the version of tensorflow used in the package is an older version:
File "modules\karoo_gp_base_class.py", line 59, in
'log': tf.log, # e.g., log(a) AttributeError: module 'tensorflow' has no attribute 'log' — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Ah! Great! Thank you!
If you don't mind sharing those changes with me directly (kai at over the sun dot com), or forking Karoo, then I will review and merge with credit to your effort. Do you know if the changes break support for the prior versions of TF?
kai
On 10/13/19 9:54 AM, ajawfi wrote:
Thanks, I actually already fixed for me. I just wanted to let you know. As you said, you just need to change couple of the TF attributes (I think I changed around 5 attributes to get it working). Great package by the way. Keep up the good work !
Sent from my iPhone
On Oct 13, 2019, at 10:21 AM, Kai Staats notifications@github.com wrote:
Ok. Looks like the means by which we define our maths functions are no longer supported, or have changed. Should be easy to fix. We will look into this ASAP ... stay tuned.
On 10/13/19 9:13 AM, ajawfi wrote:
I’m using the latest version of TF, version 2.0.
On Oct 13, 2019, at 9:08 AM, Kai Staats notifications@github.com wrote:
Which version of TF are you using?
On 10/12/19 7:22 PM, ajawfi wrote:
I get the following error when trying to run the package, it seems that the version of tensorflow used in the package is an older version:
File "modules\karoo_gp_base_class.py", line 59, in
'log': tf.log, # e.g., log(a) AttributeError: module 'tensorflow' has no attribute 'log' — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
The changes I made are in the "karoo_gp_base_class.py" file. I am really not sure if the changes will break the support of the older versions of TF.
On Sun, Oct 13, 2019 at 11:04 AM Kai Staats notifications@github.com wrote:
Ah! Great! Thank you!
If you don't mind sharing those changes with me directly (kai at over the sun dot com), or forking Karoo, then I will review and merge with credit to your effort. Do you know if the changes break support for the prior versions of TF?
kai
On 10/13/19 9:54 AM, ajawfi wrote:
Thanks, I actually already fixed for me. I just wanted to let you know. As you said, you just need to change couple of the TF attributes (I think I changed around 5 attributes to get it working). Great package by the way. Keep up the good work !
Sent from my iPhone
On Oct 13, 2019, at 10:21 AM, Kai Staats notifications@github.com wrote:
Ok. Looks like the means by which we define our maths functions are no longer supported, or have changed. Should be easy to fix. We will look into this ASAP ... stay tuned.
On 10/13/19 9:13 AM, ajawfi wrote:
I’m using the latest version of TF, version 2.0.
On Oct 13, 2019, at 9:08 AM, Kai Staats notifications@github.com wrote:
Which version of TF are you using?
On 10/12/19 7:22 PM, ajawfi wrote:
I get the following error when trying to run the package, it seems that the version of tensorflow used in the package is an older version:
File "modules\karoo_gp_base_class.py", line 59, in
'log': tf.log, # e.g., log(a) AttributeError: module 'tensorflow' has no attribute 'log' — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kstaats/karoo_gp/issues/17?email_source=notifications&email_token=AMQMZW25GG52S33WQVL7TMTQONILJA5CNFSM4JAFELLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBC2TRY#issuecomment-541436359, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMQMZWYCIPYXP3NRRMCSNA3QONILJANCNFSM4JAFELLA .
''' A NOTE TO THE NEWBIE, EXPERT, AND BRAVE Even if you are highly experienced in Genetic Programming, it is recommended that you review the 'Karoo User Guide' before running this application. While your computer will not burst into flames nor will the sun collapse into a black hole if you do not, you will likely find more enjoyment of this particular flavour of GP with a little understanding of its intent and design. '''
import sys import os import csv import time
import numpy as np import sklearn.metrics as skm
import sklearn.model_selection as skcv
from sympy import sympify from datetime import datetime from collections import OrderedDict
import karoo_gp_pause as menu
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "1"
import tensorflow as tf import ast import operator as op
operators = {ast.Add: tf.add, # e.g., a + b ast.Sub: tf.subtract, # e.g., a - b ast.Mult: tf.multiply, # e.g., a * b ast.Div: tf.divide, # e.g., a / b ast.Pow: tf.pow, # e.g., a ** 2 ast.USub: tf.negative, # e.g., -a ast.And: tf.logical_and, # e.g., a and b ast.Or: tf.logical_or, # e.g., a or b ast.Not: tf.logical_not, # e.g., not a ast.Eq: tf.equal, # e.g., a == b ast.NotEq: tf.not_equal, # e.g., a != b ast.Lt: tf.less, # e.g., a < b ast.LtE: tf.less_equal, # e.g., a <= b ast.Gt: tf.greater, # e.g., a > b ast.GtE: tf.greater_equal, # e.g., a >= 1 'abs': tf.abs, # e.g., abs(a) 'sign': tf.sign, # e.g., sign(a) 'square': tf.square, # e.g., square(a) 'sqrt': tf.sqrt, # e.g., sqrt(a) 'pow': tf.pow, # e.g., pow(a, b) 'log': tf.math.log, # e.g., log(a) 'log1p': tf.math.log1p, # e.g., log1p(a) 'cos': tf.cos, # e.g., cos(a) 'sin': tf.sin, # e.g., sin(a) 'tan': tf.tan, # e.g., tan(a) 'acos': tf.acos, # e.g., acos(a) 'asin': tf.asin, # e.g., asin(a) 'atan': tf.atan, # e.g., atan(a) }
np.set_printoptions(linewidth = 320) # set the terminal to print 320 characters before line-wrapping in order to view Trees
class Base_GP(object):
'''
This Base_BP class contains all methods for Karoo GP. Method names are differentiated from global variable names
(defined below) by the prefix 'fx_' followed by an object and action, as in fx_display_tree(), with a few
expections, such as fx_fitness_gene_pool().
The method categories (denoted by +++ banners +++) are as follows:
fx_karoo_ Methods to Run Karoo GP
fx_data_ Methods to Load and Archive Data
fx_init_ Methods to Construct the 1st Generation
fx_eval_ Methods to Evaluate a Tree
fx_fitness_ Methods to Train and Test a Tree for Fitness
fx_nextgen_ Methods to Construct the next Generation
fx_evolve_ Methods to Evolve a Population
fx_display_ Methods to Visualize a Tree
Error checks are quickly located by searching for 'ERROR!'
'''
def __init__(self):
'''
### Global variables used for data management ###
self.data_train store train data for processing in TF
self.data_test store test data for processing in TF
self.tf_device set TF computation backend device (CPU or GPU)
self.tf_device_log employed for TensorFlow debugging
self.data_train_cols number of cols in the TRAINING data - see fx_data_load()
self.data_train_rows number of rows in the TRAINING data - see fx_data_load()
self.data_test_cols number of cols in the TEST data - see fx_data_load()
self.data_test_rows number of rows in the TEST data - see fx_data_load()
self.functions user defined functions (operators) from the associated files/[functions].csv
self.terminals user defined variables (operands) from the top row of the associated [data].csv
self.coeff user defined coefficients (NOT YET IN USE)
self.fitness_type fitness type
self.datetime date-time stamp of when the unique directory is created
self.path full path to the unique directory created with each run
self.dataset local path and dataset filename
### Global variables used for evolutionary management ###
self.population_a the root generation from which Trees are chosen for mutation and reproduction
self.population_b the generation constructed from gp.population_a (recyled)
self.gene_pool once-per-generation assessment of trees that meet min and max boundary conditions
self.gen_id simple n + 1 increment
self.fitness_type set in fx_data_load() as either a minimising or maximising function
self.tree axis-1, 13 element Numpy array that defines each Tree, stored in 'gp.population'
self.pop_* 13 variables that define each Tree - see fx_init_tree_initialise()
'''
self.algo_raw = [] # the raw expression generated by Sympy per Tree -- CONSIDER MAKING THIS VARIABLE LOCAL
self.algo_sym = [] # the expression generated by Sympy per Tree -- CONSIDER MAKING THIS VARIABLE LOCAL
self.fittest_dict = {} # all Trees which share the best fitness score
self.gene_pool = [] # store all Tree IDs for use by Tournament
self.class_labels = 0 # the number of true class labels (data_y)
return
#+++++++++++++++++++++++++++++++++++++++++++++
# Methods to Run Karoo GP |
#+++++++++++++++++++++++++++++++++++++++++++++
def fx_karoo_gp(self, kernel, tree_type, tree_depth_base, tree_depth_max, tree_depth_min, tree_pop_max, gen_max, tourn_size, filename, evolve_repro, evolve_point, evolve_branch, evolve_cross, display, precision, swim, mode):
'''
This method enables the engagement of the entire Karoo GP application. Instead of returning the user to the pause
menu, this script terminates at the command-line, providing support for bash and chron job execution.
Calld by: user script karoo_gp.py
Arguments required: (see below)
'''
### PART 1 - set global variables to those local values passed from the user script ###
self.kernel = kernel # fitness function
# tree_type is passed between methods to construct specific trees
# tree_depth_base is passed between methods to construct specific trees
self.tree_depth_max = tree_depth_max # maximum Tree depth for the entire run; limits bloat
self.tree_depth_min = tree_depth_min # minimum number of nodes
self.tree_pop_max = tree_pop_max # maximum number of Trees per generation
self.gen_max = gen_max # maximum number of generations
self.tourn_size = tourn_size # number of Trees selected for each tournament
# filename is passed between methods to work with specific populations
self.evolve_repro = evolve_repro # quantity of a population generated through Reproduction
self.evolve_point = evolve_point # quantity of a population generated through Point Mutation
self.evolve_branch = evolve_branch # quantity of a population generated through Branch Mutation
self.evolve_cross = evolve_cross # quantity of a population generated through Crossover
self.display = display # display mode is set to (s)ilent # level of on-screen feedback
self.precision = precision # the number of floating points for the round function in 'fx_fitness_eval'
self.swim = swim # pass along the gene_pool restriction methodology
# mode is engaged at the end of the run, below
### PART 2 - construct first generation of Trees ###
self.fx_data_load(filename)
self.gen_id = 1 # set initial generation ID
self.population_a = ['Karoo GP by Kai Staats, Generation ' + str(self.gen_id)] # initialise population_a to host the first generation
self.population_b = ['placeholder'] # initialise population_b to satisfy fx_karoo_pause()
self.fx_init_construct(tree_type, tree_depth_base) # construct the first population of Trees
if self.kernel == 'p': # terminate here for Play mode
self.fx_display_tree(self.tree) # print the current Tree
self.fx_data_tree_write(self.population_a, 'a') # save this one Tree to disk
sys.exit()
elif self.gen_max == 1: # terminate here if constructing just one generation
self.fx_data_tree_write(self.population_a, 'a') # save this single population to disk
print ('\n We have constructed a single, stochastic population of', self.tree_pop_max,'Trees, and saved to disk')
sys.exit()
else: print ('\n We have constructed the first, stochastic population of', self.tree_pop_max,'Trees')
### PART 3 - evaluate first generation of Trees ###
print ('\n Evaluate the first generation of Trees ...')
self.fx_fitness_gym(self.population_a) # generate expression, evaluate fitness, compare fitness
self.fx_data_tree_write(self.population_a, 'a') # save the first generation of Trees to disk
### PART 4 - evolve multiple generations of Trees ###
menu = 1
while menu != 0: # this allows the user to add generations mid-run and not get buried in nested iterations
for self.gen_id in range(self.gen_id + 1, self.gen_max + 1): # evolve additional generations of Trees
print ('\n Evolve a population of Trees for Generation', self.gen_id, '...')
self.population_b = ['Karoo GP by Kai Staats - Evolving Generation'] # initialise population_b to host the next generation
self.fx_fitness_gene_pool() # generate the viable gene pool (compares against gp.tree_depth_min)
self.fx_nextgen_reproduce() # method 1 - Reproduction
self.fx_nextgen_point_mutate() # method 2 - Point Mutation
self.fx_nextgen_branch_mutate() # method 3 - Branch Mutation
self.fx_nextgen_crossover() # method 4 - Crossover
self.fx_eval_generation() # evaluate all Trees in a single generation
self.population_a = self.fx_evolve_pop_copy(self.population_b, ['Karoo GP by Kai Staats - Generation ' + str(self.gen_id)])
if mode == 's': menu = 0 # (s)erver mode - termination with completiont of prescribed run
else: # (d)esktop mode - user is given an option to quit, review, and/or modify parameters; 'add' generations continues the run
print ('\n\t\033[32m Enter \033[1m?\033[0;0m\033[32m to review your options or \033[1mq\033[0;0m\033[32muit\033[0;0m')
menu = self.fx_karoo_pause()
self.fx_karoo_terminate() # archive populations and return to karoo_gp.py for a clean exit
return
def fx_karoo_pause_refer(self):
'''
Enables (g)eneration, (i)nteractive, and (d)e(b)ug display modes to offer the (pause) menu at each prompt.
See fx_karoo_pause() for an explanation of the value being passed.
Called by: the functions called by PART 4 of fx_karoo_gp()
Arguments required: none
'''
menu = 1
while menu == 1: menu = self.fx_karoo_pause()
return
def fx_karoo_pause(self):
'''
Pause the program execution and engage the user, providing a number of options.
Called by: fx_karoo_pause_refer
Arguments required: [0,1,2] where (0) refers to an end-of-run; (1) refers to any use of the (pause) menu from
within the run, and anticipates ENTER as an escape from the menu to continue the run; and (2) refers to an
'ERROR!' for which the user may want to archive data before terminating. At this point in time, (2) is
associated with each error but does not provide any special options).
'''
### PART 1 - reset and pack values to send to menu.pause ###
menu_dict = {'input_a':'',
'input_b':0,
'display':self.display,
'tree_depth_max':self.tree_depth_max,
'tree_depth_min':self.tree_depth_min,
'tree_pop_max':self.tree_pop_max,
'gen_id':self.gen_id,
'gen_max':self.gen_max,
'tourn_size':self.tourn_size,
'evolve_repro':self.evolve_repro,
'evolve_point':self.evolve_point,
'evolve_branch':self.evolve_branch,
'evolve_cross':self.evolve_cross,
'fittest_dict':self.fittest_dict,
'pop_a_len':len(self.population_a),
'pop_b_len':len(self.population_b),
'path':self.path}
menu_dict = menu.pause(menu_dict) # call the external function menu.pause
### PART 2 - unpack values returned from menu.pause ###
input_a = menu_dict['input_a']
input_b = menu_dict['input_b']
self.display = menu_dict['display']
self.tree_depth_min = menu_dict['tree_depth_min']
self.gen_max = menu_dict['gen_max']
self.tourn_size = menu_dict['tourn_size']
self.evolve_repro = menu_dict['evolve_repro']
self.evolve_point = menu_dict['evolve_point']
self.evolve_branch = menu_dict['evolve_branch']
self.evolve_cross = menu_dict['evolve_cross']
### PART 3 - execute the user queries returned from menu.pause ###
if input_a == 'esc': return 2 # breaks out of the fx_karoo_gp() or fx_karoo_pause_refer() loop
elif input_a == 'eval': # evaluate a Tree against the TEST data
self.fx_eval_poly(self.population_b[input_b]) # generate the raw and sympified expression for the given Tree using SymPy
#print ('\n\t\033[36mTree', input_b, 'yields (raw):', self.algo_raw, '\033[0;0m') # print the raw expression
print ('\n\t\033[36mTree', input_b, 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m') # print the sympified expression
result = self.fx_fitness_eval(str(self.algo_sym), self.data_test, get_pred_labels = True) # might change to algo_raw evaluation
if self.kernel == 'c': self.fx_fitness_test_classify(result) # TF tested 2017 02/02
elif self.kernel == 'r': self.fx_fitness_test_regress(result)
elif self.kernel == 'm': self.fx_fitness_test_match(result)
# elif self.kernel == '[other]': # use others as a template
elif input_a == 'print_a': # print a Tree from population_a
self.fx_display_tree(self.population_a[input_b])
elif input_a == 'print_b': # print a Tree from population_b
self.fx_display_tree(self.population_b[input_b])
elif input_a == 'pop_a': # list all Trees in population_a
print ('')
for tree_id in range(1, len(self.population_a)):
self.fx_eval_poly(self.population_a[tree_id]) # extract the expression
print ('\t\033[36m Tree', self.population_a[tree_id][0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m')
elif input_a == 'pop_b': # list all Trees in population_b
print ('')
for tree_id in range(1, len(self.population_b)):
self.fx_eval_poly(self.population_b[tree_id]) # extract the expression
print ('\t\033[36m Tree', self.population_b[tree_id][0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m')
elif input_a == 'load': # load population_s to replace population_a
self.fx_data_recover(self.filename['s']) # NEED TO replace 's' with a user defined filename
elif input_a == 'write': # write the evolving population_b to disk
self.fx_data_tree_write(self.population_b, 'b')
print ('\n\t All current members of the evolving population_b saved to karoo_gp/runs/[date-time]/population_b.csv')
elif input_a == 'add': # check for added generations, then exit fx_karoo_pause and continue the run
self.gen_max = self.gen_max + input_b # if input_b > 0: self.gen_max = self.gen_max + input_b - REMOVED 2019 06/05
elif input_a == 'quit': self.fx_karoo_terminate() # archive populations and exit
return 1
def fx_karoo_terminate(self):
'''
Terminates the evolutionary run (if yet in progress), saves parameters and data to disk, and cleanly returns
the user to karoo_gp.py and the command line.
Called by: fx_karoo_gp() and fx_karoo_pause_refer()
Arguments required: none
'''
self.fx_data_params_write()
target = open(self.filename['f'], 'w'); target.close() # initialize the .csv file for the final population
self.fx_data_tree_write(self.population_b, 'f') # save the final generation of Trees to disk
print ('\n\t\033[32m Your Trees and runtime parameters are archived in karoo_gp/runs/[date-time]/\033[0;0m')
print ('\n\033[3m "It is not the strongest of the species that survive, nor the most intelligent,\033[0;0m')
print ('\033[3m but the one most responsive to change."\033[0;0m --Charles Darwin\n')
print ('\033[3m Congrats!\033[0;0m Your Karoo GP run is complete.\n')
sys.exit()
return
#+++++++++++++++++++++++++++++++++++++++++++++
# Methods to Load and Archive Data |
#+++++++++++++++++++++++++++++++++++++++++++++
def fx_data_load(self, filename):
'''
The data and function .csv files are loaded according to the fitness function kernel selected by the user. An
alternative dataset may be loaded at launch, by appending a command line argument. The data is then split into
both TRAINING and TEST segments in order to validate the success of the GP training run. Datasets less than
10 rows will not be split, rather copied in full to both TRAINING and TEST as it is assumed you are conducting
a system validation run, as with the built-in MATCH kernel and associated dataset.
Called by: fx_karoo_gp
Arguments required: filename (of the dataset)
'''
### PART 1 - load the associated data set, operators, operands, fitness type, and coefficients ###
# full_path = os.path.realpath(__file__); cwd = os.path.dirname(full_path) # for user Marco Cavaglia
cwd = os.getcwd()
data_dict = {'c':cwd + '/files/data_CLASSIFY.csv', 'r':cwd + '/files/data_REGRESS.csv', 'm':cwd + '/files/data_MATCH.csv', 'p':cwd + '/files/data_PLAY.csv'}
if len(sys.argv) == 1: # load data from the default karoo_gp/files/ directory
data_x = np.loadtxt(data_dict[self.kernel], skiprows = 1, delimiter = ',', dtype = float); data_x = data_x[:,0:-1] # load all but the right-most column
data_y = np.loadtxt(data_dict[self.kernel], skiprows = 1, usecols = (-1,), delimiter = ',', dtype = float) # load only right-most column (class labels)
header = open(data_dict[self.kernel],'r') # open file to be read (below)
self.dataset = data_dict[self.kernel] # copy the name only
elif len(sys.argv) == 2: # load an external data file
data_x = np.loadtxt(sys.argv[1], skiprows = 1, delimiter = ',', dtype = float); data_x = data_x[:,0:-1] # load all but the right-most column
data_y = np.loadtxt(sys.argv[1], skiprows = 1, usecols = (-1,), delimiter = ',', dtype = float) # load only right-most column (class labels)
header = open(sys.argv[1],'r') # open file to be read (below)
self.dataset = sys.argv[1] # copy the name only
elif len(sys.argv) > 2: # receive filename and additional arguments from karoo_gp.py via argparse
data_x = np.loadtxt(filename, skiprows = 1, delimiter = ',', dtype = float); data_x = data_x[:,0:-1] # load all but the right-most column
data_y = np.loadtxt(filename, skiprows = 1, usecols = (-1,), delimiter = ',', dtype = float) # load only right-most column (class labels)
header = open(filename,'r') # open file to be read (below)
self.dataset = filename # copy the name only
fitt_dict = {'c':'max', 'r':'min', 'm':'max', 'p':''}
self.fitness_type = fitt_dict[self.kernel] # load fitness type
func_dict = {'c':cwd + '/files/operators_CLASSIFY.csv', 'r':cwd + '/files/operators_REGRESS.csv', 'm':cwd + '/files/operators_MATCH.csv', 'p':cwd + '/files/operators_PLAY.csv'}
self.functions = np.loadtxt(func_dict[self.kernel], delimiter=',', skiprows=1, dtype = str) # load the user defined functions (operators)
self.terminals = header.readline().split(','); self.terminals[-1] = self.terminals[-1].replace('\n','') # load the user defined terminals (operands)
self.class_labels = len(np.unique(data_y)) # load the user defined true labels for classification or solutions for regression
#self.coeff = np.loadtxt(cwd + '/files/coefficients.csv', delimiter=',', skiprows=1, dtype = str) # load the user defined coefficients - NOT USED YET
### PART 2 - from the dataset, extract TRAINING and TEST data ###
if len(data_x) < 11: # for small datasets we will not split them into TRAINING and TEST components
data_train = np.c_[data_x, data_y]
data_test = np.c_[data_x, data_y]
else: # if larger than 10, we run the data through the SciKit Learn's 'random split' function
x_train, x_test, y_train, y_test = skcv.train_test_split(data_x, data_y, test_size = 0.2) # 80/20 TRAIN/TEST split
data_x, data_y = [], [] # clear from memory
data_train = np.c_[x_train, y_train] # recombine each row of data with its associated class label (right column)
x_train, y_train = [], [] # clear from memory
data_test = np.c_[x_test, y_test] # recombine each row of data with its associated class label (right column)
x_test, y_test = [], [] # clear from memory
self.data_train_cols = len(data_train[0,:]) # qty count
self.data_train_rows = len(data_train[:,0]) # qty count
self.data_test_cols = len(data_test[0,:]) # qty count
self.data_test_rows = len(data_test[:,0]) # qty count
### PART 3 - load TRAINING and TEST data for TensorFlow processing - tested 2017 02/02
self.data_train = data_train # Store train data for processing in TF
self.data_test = data_test # Store test data for processing in TF
self.tf_device = "/gpu:0" # Set TF computation backend device (CPU or GPU); gpu:n = 1st, 2nd, or ... GPU device
self.tf_device_log = False # TF device usage logging (for debugging)
### PART 4 - create a unique directory and initialise all .csv files ###
self.datetime = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
self.path = os.path.join(cwd, 'runs/', filename.split('.')[0] + '_' + self.datetime + '/') # generate a unique directory name
if not os.path.isdir(self.path): os.makedirs(self.path) # make a unique directory
self.filename = {} # a dictionary to hold .csv filenames
self.filename.update( {'a':self.path + 'population_a.csv'} )
target = open(self.filename['a'], 'w'); target.close() # initialise a .csv file for population 'a' (foundation)
self.filename.update( {'b':self.path + 'population_b.csv'} )
target = open(self.filename['b'], 'w'); target.close() # initialise a .csv file for population 'b' (evolving)
self.filename.update( {'f':self.path + 'population_f.csv'} )
target = open(self.filename['f'], 'w'); target.close() # initialise a .csv file for the final population (test)
self.filename.update( {'s':self.path + 'population_s.csv'} )
target = open(self.filename['s'], 'w'); target.close() # initialise a .csv file to manually load (seed)
return
def fx_data_recover(self, population):
'''
This method is used to load a saved population of Trees, as invoked through the (pause) menu where population_r
replaces population_a in the karoo_gp/runs/[date-time]/ directory.
Called by: fx_karoo_pause
Arguments required: population (filename['s'])
'''
with open(population, 'rb') as csv_file:
target = csv.reader(csv_file, delimiter=',')
n = 0 # track row count
for row in target:
print ('row', row)
n = n + 1
if n == 1: pass # skip first empty row
elif n == 2:
self.population_a = [row] # write header to population_a
else:
if row == []:
self.tree = np.array([[]]) # initialise Tree array
else:
if self.tree.shape[1] == 0:
self.tree = np.append(self.tree, [row], axis = 1) # append first row to Tree
else:
self.tree = np.append(self.tree, [row], axis = 0) # append subsequent rows to Tree
if self.tree.shape[0] == 13:
self.population_a.append(self.tree) # append complete Tree to population list
print ('\n', self.population_a)
return
def fx_data_tree_clean(self, tree):
'''
This method aesthetically cleans the Tree array, removing redundant data.
Called by: fx_data_tree_append, fx_evolve_branch_copy
Arguments required: tree
'''
tree[0][2:] = '' # A little clean-up to make things look pretty :)
tree[1][2:] = '' # Ignore the man behind the curtain!
tree[2][2:] = '' # Yes, I am a bit OCD ... but you *know* you appreciate clean arrays.
return tree
def fx_data_tree_append(self, tree):
'''
Append Tree array to the foundation Population.
Called by: fx_init_construct
Arguments required: tree
'''
self.fx_data_tree_clean(tree) # clean 'tree' prior to storing
self.population_a.append(tree) # append 'tree' to population list
return
def fx_data_tree_write(self, population, key):
'''
Save population_* to disk.
Called by: fx_karoo_gp, fx_eval_generation
Arguments required: population, key
'''
with open(self.filename[key], 'a') as csv_file:
target = csv.writer(csv_file, delimiter=',')
if self.gen_id != 1: target.writerows(['']) # empty row before each generation
target.writerows([['Karoo GP by Kai Staats', 'Generation:', str(self.gen_id)]])
for tree in range(1, len(population)):
target.writerows(['']) # empty row before each Tree
for row in range(0, 13): # increment through each row in the array Tree
target.writerows([population[tree][row]])
return
def fx_data_params_write(self): # tested 2017 02/13; argument 'app' removed to simplify termination 2019 06/08
'''
Save run-time configuration parameters to disk.
Called by: fx_karoo_gp, fx_karoo_pause
Arguments required: app
'''
file = open(self.path + 'log_config.txt', 'w')
file.write('Karoo GP')
file.write('\n launched: ' + str(self.datetime))
file.write('\n dataset: ' + str(self.dataset))
file.write('\n')
file.write('\n kernel: ' + str(self.kernel))
file.write('\n precision: ' + str(self.precision))
file.write('\n')
# file.write('tree type: ' + tree_type)
# file.write('tree depth base: ' + str(tree_depth_base))
file.write('\n tree depth max: ' + str(self.tree_depth_max))
file.write('\n min node count: ' + str(self.tree_depth_min))
file.write('\n')
file.write('\n genetic operator Reproduction: ' + str(self.evolve_repro))
file.write('\n genetic operator Point Mutation: ' + str(self.evolve_point))
file.write('\n genetic operator Branch Mutation: ' + str(self.evolve_branch))
file.write('\n genetic operator Crossover: ' + str(self.evolve_cross))
file.write('\n')
file.write('\n tournament size: ' + str(self.tourn_size))
file.write('\n population: ' + str(self.tree_pop_max))
file.write('\n number of generations: ' + str(self.gen_id))
file.write('\n\n')
file.close()
file = open(self.path + 'log_test.txt', 'w')
file.write('Karoo GP')
file.write('\n launched: ' + str(self.datetime))
file.write('\n dataset: ' + str(self.dataset))
file.write('\n')
if len(self.fittest_dict) > 0:
fitness_best = 0
fittest_tree = 0
# revised method, re-evaluating all Trees from stored fitness score
for tree_id in range(1, len(self.population_b)):
fitness = float(self.population_b[tree_id][12][1])
if self.kernel == 'c': # display best fit Trees for the CLASSIFY kernel
if fitness >= fitness_best: # find the Tree with Maximum fitness score
fitness_best = fitness; fittest_tree = tree_id # set best fitness Tree
elif self.kernel == 'r': # display best fit Trees for the REGRESSION kernel
if fitness_best == 0: fitness_best = fitness # set the baseline first time through
if fitness <= fitness_best: # find the Tree with Minimum fitness score
fitness_best = fitness; fittest_tree = tree_id # set best fitness Tree
elif self.kernel == 'm': # display best fit Trees for the MATCH kernel
if fitness == self.data_train_rows: # find the Tree with a perfect match for all data rows
fitness_best = fitness; fittest_tree = tree_id # set best fitness Tree
# elif self.kernel == '[other]': # use others as a template
# print ('fitness_best:', fitness_best, 'fittest_tree:', fittest_tree)
# test the most fit Tree and write to the .txt log
self.fx_eval_poly(self.population_b[int(fittest_tree)]) # generate the raw and sympified expression for the given Tree using SymPy
expr = str(self.algo_sym) # get simplified expression and process it by TF - tested 2017 02/02
result = self.fx_fitness_eval(expr, self.data_test, get_pred_labels = True)
file.write('\n\n Tree ' + str(fittest_tree) + ' is the most fit, with expression:')
file.write('\n\n ' + str(self.algo_sym))
if self.kernel == 'c':
file.write('\n\n Classification fitness score: {}'.format(result['fitness']))
file.write('\n\n Precision-Recall report:\n {}'.format(skm.classification_report(result['solution'], result['pred_labels'][0])))
file.write('\n Confusion matrix:\n {}'.format(skm.confusion_matrix(result['solution'], result['pred_labels'][0])))
elif self.kernel == 'r':
MSE, fitness = skm.mean_squared_error(result['result'], result['solution']), result['fitness']
file.write('\n\n Regression fitness score: {}'.format(fitness))
file.write('\n Mean Squared Error: {}'.format(MSE))
elif self.kernel == 'm':
file.write('\n\n Matching fitness score: {}'.format(result['fitness']))
# elif self.kernel == '[other]': # use others as a template
else: file.write('\n\n There were no evolved solutions generated in this run... your species has gone extinct!')
file.write('\n\n')
file.close()
return
#+++++++++++++++++++++++++++++++++++++++++++++
# Methods to Construct the 1st Generation |
#+++++++++++++++++++++++++++++++++++++++++++++
def fx_init_construct(self, tree_type, tree_depth_base):
'''
This method constructs the initial population of Tree type 'tree_type' and of the size tree_depth_base. The Tree
can be Full, Grow, or "Ramped Half/Half" as defined by John Koza.
Called by: fx_karoo_gp
Arguments required: tree_type, tree_depth_base
'''
if self.display == 'i':
print ('\n\t\033[32m Press \033[36m\033[1m?\033[0;0m\033[32m at any \033[36m\033[1m(pause)\033[0;0m\033[32m, or \033[36m\033[1mENTER\033[0;0m \033[32mto continue the run\033[0;0m'); self.fx_karoo_pause_refer()
if tree_type == 'r': # Ramped 50/50
TREE_ID = 1
for n in range(1, int((self.tree_pop_max / 2) / tree_depth_base) + 1): # split the population into equal parts
for depth in range(1, tree_depth_base + 1): # build 2 Trees at each depth
self.fx_init_tree_build(TREE_ID, 'f', depth) # build a Full Tree
self.fx_data_tree_append(self.tree) # append Tree to the list 'gp.population_a'
TREE_ID = TREE_ID + 1
self.fx_init_tree_build(TREE_ID, 'g', depth) # build a Grow Tree
self.fx_data_tree_append(self.tree) # append Tree to the list 'gp.population_a'
TREE_ID = TREE_ID + 1
if TREE_ID < self.tree_pop_max: # eg: split 100 by 2*3 and it will produce only 96 Trees ...
for n in range(self.tree_pop_max - TREE_ID + 1): # ... so we complete the run
self.fx_init_tree_build(TREE_ID, 'g', tree_depth_base)
self.fx_data_tree_append(self.tree)
TREE_ID = TREE_ID + 1
else: pass
else: # Full or Grow
for TREE_ID in range(1, self.tree_pop_max + 1):
self.fx_init_tree_build(TREE_ID, tree_type, tree_depth_base) # build the 1st generation of Trees
self.fx_data_tree_append(self.tree)
return
def fx_init_tree_build(self, TREE_ID, tree_type, tree_depth_base):
'''
This method combines 4 sub-methods into a single method for ease of deployment. It is designed to executed
within a loop such that an entire population is built. However, it may also be run from the command line,
passing a single TREE_ID to the method.
'tree_type' is either (f)ull or (g)row. Note, however, that when the user selects 'ramped 50/50' at launch,
it is still (f) or (g) which are passed to this method.
Called by: fx_init_construct, fx_evolve_crossover, fx_evolve_grow_mutate
Arguments required: TREE_ID, tree_type, tree_depth_base
'''
self.fx_init_tree_initialise(TREE_ID, tree_type, tree_depth_base) # initialise a new Tree
self.fx_init_root_build() # build the Root node
self.fx_init_function_build() # build the Function nodes
self.fx_init_terminal_build() # build the Terminal nodes
return # each Tree is written to 'gp.tree'
def fx_init_tree_initialise(self, TREE_ID, tree_type, tree_depth_base):
'''
Assign 13 global variables to the array 'tree'.
Build the array 'tree' with 13 rows and initally, just 1 column of labels. This array will grow horizontally as
each new node is appended. The values of this array are stored as string characters, numbers forced to integers at
the point of execution.
Use of the debug (db) interface mode enables the user to watch the genetic operations as they work on the Trees.
Called by: fx_init_tree_build
Arguments required: TREE_ID, tree_type, tree_depth_base
'''
self.pop_TREE_ID = TREE_ID # pos 0: a unique identifier for each tree
self.pop_tree_type = tree_type # pos 1: a global constant based upon the initial user setting
self.pop_tree_depth_base = tree_depth_base # pos 2: a global variable which conveys 'tree_depth_base' as unique to each new Tree
self.pop_NODE_ID = 1 # pos 3: unique identifier for each node; this is the INDEX KEY to this array
self.pop_node_depth = 0 # pos 4: depth of each node when committed to the array
self.pop_node_type = '' # pos 5: root, function, or terminal
self.pop_node_label = '' # pos 6: operator [+, -, *, ...] or terminal [a, b, c, ...]
self.pop_node_parent = '' # pos 7: parent node
self.pop_node_arity = '' # pos 8: number of nodes attached to each non-terminal node
self.pop_node_c1 = '' # pos 9: child node 1
self.pop_node_c2 = '' # pos 10: child node 2
self.pop_node_c3 = '' # pos 11: child node 3 (assumed max of 3 with boolean operator 'if')
self.pop_fitness = '' # pos 12: fitness score following Tree evaluation
self.tree = np.array([ ['TREE_ID'],['tree_type'],['tree_depth_base'],['NODE_ID'],['node_depth'],['node_type'],['node_label'],['node_parent'],['node_arity'],['node_c1'],['node_c2'],['node_c3'],['fitness'] ])
return
### Root Node ###
def fx_init_root_build(self):
'''
Build the Root node for the initial population.
Called by: fx_init_tree_build
Arguments required: none
'''
self.fx_init_function_select() # select the operator for root
if self.pop_node_arity == 1: # 1 child
self.pop_node_c1 = 2
self.pop_node_c2 = ''
self.pop_node_c3 = ''
elif self.pop_node_arity == 2: # 2 children
self.pop_node_c1 = 2
self.pop_node_c2 = 3
self.pop_node_c3 = ''
elif self.pop_node_arity == 3: # 3 children
self.pop_node_c1 = 2
self.pop_node_c2 = 3
self.pop_node_c3 = 4
else: print ('\n\t\033[31m ERROR! In fx_init_root_build: pop_node_arity =', self.pop_node_arity, '\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08
self.pop_node_type = 'root'
self.fx_init_node_commit()
return
### Function Nodes ###
def fx_init_function_build(self):
'''
Build the Function nodes for the intial population.
Called by: fx_init_tree_build
Arguments required: none
'''
for i in range(1, self.pop_tree_depth_base): # increment depth, from 1 through 'tree_depth_base' - 1
self.pop_node_depth = i # increment 'node_depth'
parent_arity_sum = 0
prior_sibling_arity = 0 # reset for 'c_buffer' in 'children_link'
prior_siblings = 0 # reset for 'c_buffer' in 'children_link'
for j in range(1, len(self.tree[3])): # increment through all nodes (exclude 0) in array 'tree'
if int(self.tree[4][j]) == self.pop_node_depth - 1: # find parent nodes which reside at the prior depth
parent_arity_sum = parent_arity_sum + int(self.tree[8][j]) # sum arities of all parent nodes at the prior depth
# (do *not* merge these 2 "j" loops or it gets all kinds of messed up)
for j in range(1, len(self.tree[3])): # increment through all nodes (exclude 0) in array 'tree'
if int(self.tree[4][j]) == self.pop_node_depth - 1: # find parent nodes which reside at the prior depth
for k in range(1, int(self.tree[8][j]) + 1): # increment through each degree of arity for each parent node
self.pop_node_parent = int(self.tree[3][j]) # set the parent 'NODE_ID' ...
prior_sibling_arity = self.fx_init_function_gen(parent_arity_sum, prior_sibling_arity, prior_siblings) # ... generate a Function ndoe
prior_siblings = prior_siblings + 1 # sum sibling nodes (current depth) who will spawn their own children (cousins? :)
return
def fx_init_function_gen(self, parent_arity_sum, prior_sibling_arity, prior_siblings):
'''
Generate a single Function node for the initial population.
Called by fx_init_function_build
Arguments required: parent_arity_sum, prior_sibling_arity, prior_siblings
'''
if self.pop_tree_type == 'f': # user defined as (f)ull
self.fx_init_function_select() # retrieve a function
self.fx_init_child_link(parent_arity_sum, prior_sibling_arity, prior_siblings) # establish links to children
elif self.pop_tree_type == 'g': # user defined as (g)row
rnd = np.random.randint(2)
if rnd == 0: # randomly selected as Function
self.fx_init_function_select() # retrieve a function
self.fx_init_child_link(parent_arity_sum, prior_sibling_arity, prior_siblings) # establish links to children
elif rnd == 1: # randomly selected as Terminal
self.fx_init_terminal_select() # retrieve a terminal
self.pop_node_c1 = ''
self.pop_node_c2 = ''
self.pop_node_c3 = ''
self.fx_init_node_commit() # commit new node to array
prior_sibling_arity = prior_sibling_arity + self.pop_node_arity # sum the arity of prior siblings
return prior_sibling_arity
def fx_init_function_select(self):
'''
Define a single Function (operator extracted from the associated functions.csv) for the initial population.
Called by: fx_init_function_gen, fx_init_root_build
Arguments required: none
'''
self.pop_node_type = 'func'
rnd = np.random.randint(0, len(self.functions[:,0])) # call the previously loaded .csv which contains all operators
self.pop_node_label = self.functions[rnd][0]
self.pop_node_arity = int(self.functions[rnd][1])
return
### Terminal Nodes ###
def fx_init_terminal_build(self):
'''
Build the Terminal nodes for the intial population.
Called by: fx_init_tree_build
Arguments required: none
'''
self.pop_node_depth = self.pop_tree_depth_base # set the final node_depth (same as 'gp.pop_node_depth' + 1)
for j in range(1, len(self.tree[3]) ): # increment through all nodes (exclude 0) in array 'tree'
if int(self.tree[4][j]) == self.pop_node_depth - 1: # find parent nodes which reside at the prior depth
for k in range(1,(int(self.tree[8][j]) + 1)): # increment through each degree of arity for each parent node
self.pop_node_parent = int(self.tree[3][j]) # set the parent 'NODE_ID' ...
self.fx_init_terminal_gen() # ... generate a Terminal node
return
def fx_init_terminal_gen(self):
'''
Generate a single Terminal node for the initial population.
Called by: fx_init_terminal_build
Arguments required: none
'''
self.fx_init_terminal_select() # retrieve a terminal
self.pop_node_c1 = ''
self.pop_node_c2 = ''
self.pop_node_c3 = ''
self.fx_init_node_commit() # commit new node to array
return
def fx_init_terminal_select(self):
'''
Define a single Terminal (variable extracted from the top row of the associated TRAINING data)
Called by: fx_init_terminal_gen, fx_init_function_gen
Arguments required: none
'''
self.pop_node_type = 'term'
rnd = np.random.randint(0, len(self.terminals) - 1) # call the previously loaded .csv which contains all terminals
self.pop_node_label = self.terminals[rnd]
self.pop_node_arity = 0
return
### The Lovely Children ###
def fx_init_child_link(self, parent_arity_sum, prior_sibling_arity, prior_siblings):
'''
Link each parent node to its children in the intial population.
Called by: fx_init_function_gen
Arguments required: parent_arity_sum, prior_sibling_arity, prior_siblings
'''
c_buffer = 0
for n in range(1, len(self.tree[3]) ): # increment through all nodes (exclude 0) in array 'tree'
if int(self.tree[4][n]) == self.pop_node_depth - 1: # find all nodes that reside at the prior (parent) 'node_depth'
c_buffer = self.pop_NODE_ID + (parent_arity_sum + prior_sibling_arity - prior_siblings) # One algo to rule the world!
if self.pop_node_arity == 0: # terminal in a Grow Tree
self.pop_node_c1 = ''
self.pop_node_c2 = ''
self.pop_node_c3 = ''
elif self.pop_node_arity == 1: # 1 child
self.pop_node_c1 = c_buffer
self.pop_node_c2 = ''
self.pop_node_c3 = ''
elif self.pop_node_arity == 2: # 2 children
self.pop_node_c1 = c_buffer
self.pop_node_c2 = c_buffer + 1
self.pop_node_c3 = ''
elif self.pop_node_arity == 3: # 3 children
self.pop_node_c1 = c_buffer
self.pop_node_c2 = c_buffer + 1
self.pop_node_c3 = c_buffer + 2
else: print ('\n\t\033[31m ERROR! In fx_init_child_link: pop_node_arity =', self.pop_node_arity, '\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08
return
def fx_init_node_commit(self):
'''
Commit the values of a new node (root, function, or terminal) to the array 'tree'.
Called by: fx_init_root_build, fx_init_function_gen, fx_init_terminal_gen
Arguments required: none
'''
self.tree = np.append(self.tree, [ [self.pop_TREE_ID],[self.pop_tree_type],[self.pop_tree_depth_base],[self.pop_NODE_ID],[self.pop_node_depth],[self.pop_node_type],[self.pop_node_label],[self.pop_node_parent],[self.pop_node_arity],[self.pop_node_c1],[self.pop_node_c2],[self.pop_node_c3],[self.pop_fitness] ], 1)
self.pop_NODE_ID = self.pop_NODE_ID + 1
return
#+++++++++++++++++++++++++++++++++++++++++++++
# Methods to Evaluate a Tree |
#+++++++++++++++++++++++++++++++++++++++++++++
def fx_eval_poly(self, tree):
'''
Evaluate a Tree and generate its multivariate expression (both raw and Sympified).
We need to extract the variables from the expression. However, these variables are no longer correlated
to the original variables listed across the top of each column of data.csv. Therefore, we must re-assign
the respective values for each subsequent row in the data .csv, for each Tree's unique expression.
Called by: fx_karoo_pause, fx_data_params_write, fx_eval_label, fx_fitness_gym, fx_fitness_gene_pool, fx_display_tree
Arguments required: tree
'''
self.algo_raw = self.fx_eval_label(tree, 1) # pass the root 'node_id', then flatten the Tree to a string
self.algo_sym = sympify(self.algo_raw) # convert string to a functional expression (the coolest line in Karoo! :)
return
def fx_eval_label(self, tree, node_id):
'''
Evaluate all or part of a Tree (starting at node_id) and return a raw mutivariate expression ('algo_raw').
This method is called once per Tree, but may be called at any time to prepare an expression for any full or
partial (branch) Tree contained in 'population'. Pass the starting node for recursion via the local variable
'node_id' where the local variable 'tree' is a copy of the Tree you desire to evaluate.
Called by: fx_eval_poly, fx_eval_label (recursively)
Arguments required: tree, node_id
'''
# if tree[6, node_id] == 'not': tree[6, node_id] = ', not' # temp until this can be fixed at data_load
node_id = int(node_id)
if tree[8, node_id] == '0': # arity of 0 for the pattern '[term]'
return '(' + tree[6, node_id] + ')' # 'node_label' (function or terminal)
else:
if tree[8, node_id] == '1': # arity of 1 for the explicit pattern 'not [term]'
return self.fx_eval_label(tree, tree[9, node_id]) + tree[6, node_id]
elif tree[8, node_id] == '2': # arity of 2 for the pattern '[func] [term] [func]'
return self.fx_eval_label(tree, tree[9, node_id]) + tree[6, node_id] + self.fx_eval_label(tree, tree[10, node_id])
elif tree[8, node_id] == '3': # arity of 3 for the explicit pattern 'if [term] then [term] else [term]'
return tree[6, node_id] + self.fx_eval_label(tree, tree[9, node_id]) + ' then ' + self.fx_eval_label(tree, tree[10, node_id]) + ' else ' + self.fx_eval_label(tree, tree[11, node_id])
def fx_eval_id(self, tree, node_id):
'''
Evaluate all or part of a Tree and return a list of all 'NODE_ID's.
This method generates a list of all 'NODE_ID's from the given Node and below. It is used primarily to generate
'branch' for the multi-generational mutation of Trees.
Pass the starting node for recursion via the local variable 'node_id' where the local variable 'tree' is a copy
of the Tree you desire to evaluate.
Called by: fx_eval_id (recursively), fx_evolve_branch_select
Arguments required: tree, node_id
'''
node_id = int(node_id)
if tree[8, node_id] == '0': # arity of 0 for the pattern '[NODE_ID]'
return tree[3, node_id] # 'NODE_ID'
else:
if tree[8, node_id] == '1': # arity of 1 for the pattern '[NODE_ID], [NODE_ID]'
return tree[3, node_id] + ', ' + self.fx_eval_id(tree, tree[9, node_id])
elif tree[8, node_id] == '2': # arity of 2 for the pattern '[NODE_ID], [NODE_ID], [NODE_ID]'
return tree[3, node_id] + ', ' + self.fx_eval_id(tree, tree[9, node_id]) + ', ' + self.fx_eval_id(tree, tree[10, node_id])
elif tree[8, node_id] == '3': # arity of 3 for the pattern '[NODE_ID], [NODE_ID], [NODE_ID], [NODE_ID]'
return tree[3, node_id] + ', ' + self.fx_eval_id(tree, tree[9, node_id]) + ', ' + self.fx_eval_id(tree, tree[10, node_id]) + ', ' + self.fx_eval_id(tree, tree[11, node_id])
def fx_eval_generation(self):
'''
This method invokes the evaluation of an entire generation of Trees. It automatically evaluates population_b
before invoking the copy of _b to _a.
Called by: fx_karoo_gp
Arguments required: none
'''
if self.display != 's':
if self.display == 'i': print ('')
print ('\n Evaluate all Trees in Generation', self.gen_id)
if self.display == 'i': self.fx_karoo_pause_refer() # 2019 06/07
for tree_id in range(1, len(self.population_b)): # renumber all Trees in given population - merged fx_evolve_tree_renum 2018 04/12
self.population_b[tree_id][0][1] = tree_id
self.fx_fitness_gym(self.population_b) # run fx_eval(), fx_fitness(), fx_fitness_store(), and fitness record
self.fx_data_tree_write(self.population_b, 'a') # archive current population as foundation for next generation
if self.display != 's':
print ('\n Copy gp.population_b to gp.population_a\n')
return
#+++++++++++++++++++++++++++++++++++++++++++++
# Methods to Train and Test a Tree |
#+++++++++++++++++++++++++++++++++++++++++++++
def fx_fitness_gym(self, population):
'''
Part 1 evaluates each expression against the data, line for line. This is the most time consuming and
computationally expensive part of genetic programming. When GPUs are available, the performance can increase
by many orders of magnitude for datasets measured in millions of data.
Part 2 evaluates every Tree in each generation to determine which have the best, overall fitness score. This
could be the highest or lowest depending upon if the fitness function is maximising (higher is better) or
minimising (lower is better). The total fitness score is then saved with each Tree in the external .csv file.
Part 3 compares the fitness of each Tree to the prior best fit in order to track those that improve with each
comparison. For matching functions, all the Trees will have the same fitness score, but they may present more
than one solution. For minimisation and maximisation functions, the final Tree should present the best overall
fitness for that generation. It is important to note that Part 3 does *not* in any way influence the Tournament
Selection which is a stand-alone process.
Called by: fx_karoo_gp, fx_eval_generations
Arguments required: population
'''
fitness_best = 0
self.fittest_dict = {}
time_sum = 0
for tree_id in range(1, len(population)):
### PART 1 - GENERATE MULTIVARIATE EXPRESSION FOR EACH TREE ###
self.fx_eval_poly(population[tree_id]) # extract the expression
if self.display not in ('s'): print ('\t\033[36mTree', population[tree_id][0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m')
### PART 2 - EVALUATE FITNESS FOR EACH TREE AGAINST TRAINING DATA ###
fitness = 0
expr = str(self.algo_sym) # get sympified expression and process it with TF - tested 2017 02/02
result = self.fx_fitness_eval(expr, self.data_train)
fitness = result['fitness'] # extract fitness score
if self.display == 'i':
print ('\t \033[36m with fitness sum:\033[1m', fitness, '\033[0;0m\n')
self.fx_fitness_store(population[tree_id], fitness) # store Fitness with each Tree
### PART 3 - COMPARE FITNESS OF ALL TREES IN CURRENT GENERATION ###
if self.kernel == 'c': # display best fit Trees for the CLASSIFY kernel
if fitness >= fitness_best: # find the Tree with Maximum fitness score
fitness_best = fitness # set best fitness score
self.fittest_dict.update({tree_id:self.algo_sym}) # add to dictionary if fitness >= prior
elif self.kernel == 'r': # display best fit Trees for the REGRESSION kernel
if fitness_best == 0: fitness_best = fitness # set the baseline first time through
if fitness <= fitness_best: # find the Tree with Minimum fitness score
fitness_best = fitness # set best fitness score
self.fittest_dict.update({tree_id:self.algo_sym}) # add to dictionary if fitness <= prior
elif self.kernel == 'm': # display best fit Trees for the MATCH kernel
if fitness == self.data_train_rows: # find the Tree with a perfect match for all data rows
fitness_best = fitness # set best fitness score
self.fittest_dict.update({tree_id:self.algo_sym}) # add to dictionary if all rows match
# elif self.kernel == '[other]': # use others as a template
print ('\n\033[36m ', len(list(self.fittest_dict.keys())), 'trees\033[1m', np.sort(list(self.fittest_dict.keys())), '\033[0;0m\033[36moffer the highest fitness scores.\033[0;0m')
if self.display == 'g': self.fx_karoo_pause_refer() # 2019 06/07
return
def fx_fitness_eval(self, expr, data, get_pred_labels = False):
'''
Computes tree expression using TensorFlow (TF) returning results and fitness scores.
This method orchestrates most of the TF routines by parsing input string 'expression' and converting it into a TF
operation graph which is then processed in an isolated TF session to compute the results and corresponding fitness
values.
'self.tf_device' - controls which device will be used for computations (CPU or GPU).
'self.tf_device_log' - controls device placement logging (debug only).
Args:
'expr' - a string containing math expression to be computed on the data. Variable names should match corresponding
terminal names in 'self.terminals'.
'data' - an 'n by m' matrix of the data points containing n observations and m features per observation.
Variable order should match corresponding order of terminals in 'self.terminals'.
'get_pred_labels' - a boolean flag which controls whether the predicted labels should be extracted from the
evolved results. This applies only to the CLASSIFY kernel and defaults to 'False'.
Returns:
A dict mapping keys to the following outputs:
'result' - an array of the results of applying given expression to the data
'pred_labels' - an array of the predicted labels extracted from the results; defined only for CLASSIFY kernel, else None
'solution' - an array of the solution values extracted from the data (variable 's' in the dataset)
'pairwise_fitness' - an array of the element-wise results of applying corresponding fitness kernel function
'fitness' - aggregated scalar fitness score
Called by: fx_karoo_pause, fx_data_params_write, fx_fitness_gym
Arguments required: expr, data
'''
# Initialize TensorFlow session
tf.compat.v1.reset_default_graph() # Reset TF internal state and cache (after previous processing)
config = tf.compat.v1.ConfigProto(log_device_placement=self.tf_device_log, allow_soft_placement=True)
config.gpu_options.allow_growth = True
with tf.compat.v1.Session(config=config) as sess:
with sess.graph.device(self.tf_device):
# 1 - Load data into TF vectors
tensors = {}
for i in range(len(self.terminals)):
var = self.terminals[i]
tensors[var] = tf.constant(data[:, i], dtype=tf.float32) # converts data into vectors
# 2- Transform string expression into TF operation graph
result = self.fx_fitness_expr_parse(expr, tensors)
pred_labels = tf.no_op() # a placeholder, applies only to CLASSIFY kernel
solution = tensors['s'] # solution value is assumed to be stored in 's' terminal
# 3- Add fitness computation into TF graph
if self.kernel == 'c': # CLASSIFY kernel
'''
Creates element-wise fitness computation TensorFlow (TF) sub-graph for CLASSIFY kernel.
This method uses the 'sympified' (SymPy) expression ('algo_sym') created in fx_eval_poly() and the data set
loaded at run-time to evaluate the fitness of the selected kernel.
This multiclass classifer compares each row of a given Tree to the known solution, comparing predicted labels
generated by Karoo GP against the true classs labels. This method is able to work with any number of class
labels, from 2 to n. The left-most bin includes -inf. The right-most bin includes +inf. Those inbetween are
by default confined to the spacing of 1.0 each, as defined by:
(solution - 1) < result <= solution
The skew adjusts the boundaries of the bins such that they fall on both the negative and positive sides of the
origin. At the time of this writing, an odd number of class labels will generate an extra bin on the positive
side of origin as it has not yet been determined the effect of enabling the middle bin to include both a
negative and positive result.
'''
# was breaking with upgrade from Tensorflow 1.1 to 1.3; fixed by Iurii by replacing [] with () as of 20171026
# if get_pred_labels: pred_labels = tf.map_fn(self.fx_fitness_labels_map, result, dtype = [tf.int32, tf.string], swap_memory = True)
if get_pred_labels: pred_labels = tf.map_fn(self.fx_fitness_labels_map, result, dtype = (tf.int32, tf.string), swap_memory = True)
skew = (self.class_labels / 2) - 1
rule11 = tf.equal(solution, 0)
rule12 = tf.less_equal(result, 0 - skew)
rule13 = tf.logical_and(rule11, rule12)
rule21 = tf.equal(solution, self.class_labels - 1)
rule22 = tf.greater(result, solution - 1 - skew)
rule23 = tf.logical_and(rule21, rule22)
rule31 = tf.less(solution - 1 - skew, result)
rule32 = tf.less_equal(result, solution - skew)
rule33 = tf.logical_and(rule31, rule32)
pairwise_fitness = tf.cast(tf.logical_or(tf.logical_or(rule13, rule23), rule33), tf.int32)
elif self.kernel == 'r': # REGRESSION kernel
'''
A very, very basic REGRESSION kernel which is not designed to perform well in the real world. It requires
that you raise the minimum node count to keep it from converging on the value of '1'. Consider writing or
integrating a more sophisticated kernel.
'''
pairwise_fitness = tf.abs(solution - result)
elif self.kernel == 'm': # MATCH kernel
'''
This is used for demonstration purposes only.
'''
# pairwise_fitness = tf.cast(tf.equal(solution, result), tf.int32) # breaks due to floating points
RTOL, ATOL = 1e-05, 1e-08 # fixes above issue by checking if a float value lies within a range of values
pairwise_fitness = tf.cast(tf.less_equal(tf.abs(solution - result), ATOL + RTOL * tf.abs(result)), tf.int32)
# elif self.kernel == '[other]': # use others as a template
else: raise Exception('Kernel type is wrong or missing. You entered {}'.format(self.kernel))
fitness = tf.reduce_sum(pairwise_fitness)
# Process TF graph and collect the results
result, pred_labels, solution, fitness, pairwise_fitness = sess.run([result, pred_labels, solution, fitness, pairwise_fitness])
return {'result': result, 'pred_labels': pred_labels, 'solution': solution, 'fitness': float(fitness), 'pairwise_fitness': pairwise_fitness}
def fx_fitness_expr_parse(self, expr, tensors):
'''
Extract expression tree from the string algo_sym and transform into TensorFlow (TF) graph.
Called by: fx_fitness_eval
Arguments required: expr, tensors
'''
tree = ast.parse(expr, mode='eval').body
return self.fx_fitness_node_parse(tree, tensors)
def fx_fitness_chain_bool(self, values, operation, tensors):
'''
Chains a sequence of boolean operations (e.g. 'a and b and c') into a single TensorFlow (TF) sub graph.
Called by: fx_fitness_node_parse
Arguments required: values, operation, tensors
'''
x = tf.cast(self.fx_fitness_node_parse(values[0], tensors), tf.bool)
if len(values) > 1:
return operation(x, self.fx_fitness_chain_bool(values[1:], operation, tensors))
else:
return x
def fx_fitness_chain_compare(self, comparators, ops, tensors):
'''
Chains a sequence of comparison operations (e.g. 'a > b < c') into a single TensorFlow (TF) sub graph.
Called by: fx_fitness_node_parse
Arguments required: comparators, ops, tensors
'''
x = self.fx_fitness_node_parse(comparators[0], tensors)
y = self.fx_fitness_node_parse(comparators[1], tensors)
if len(comparators) > 2:
return tf.logical_and(operators[type(ops[0])](x, y), self.fx_fitness_chain_compare(comparators[1:], ops[1:], tensors))
else:
return operators[type(ops[0])](x, y)
def fx_fitness_node_parse(self, node, tensors):
'''
Recursively transforms parsed expression tree into TensorFlow (TF) graph.
Called by: fx_fitness_expr_parse, fx_fitness_chain_bool, fx_fitness_chain_compare
Arguments required: node, tensors
'''
if isinstance(node, ast.Name): # <tensor_name>
return tensors[node.id]
elif isinstance(node, ast.Num): # <number>
#shape = tensors[tensors.keys()[0]].get_shape() # Python 2.7
shape = tensors[list(tensors.keys())[0]].get_shape()
return tf.constant(node.n, shape=shape, dtype=tf.float32)
elif isinstance(node, ast.BinOp): # <left> <operator> <right>, e.g., x + y
return operators[type(node.op)](self.fx_fitness_node_parse(node.left, tensors), self.fx_fitness_node_parse(node.right, tensors))
elif isinstance(node, ast.UnaryOp): # <operator> <operand> e.g., -1
return operators[type(node.op)](self.fx_fitness_node_parse(node.operand, tensors))
elif isinstance(node, ast.Call): # <function>(<arguments>) e.g., sin(x)
return operators[node.func.id](*[self.fx_fitness_node_parse(arg, tensors) for arg in node.args])
elif isinstance(node, ast.BoolOp): # <left> <bool_operator> <right> e.g. x or y
return self.fx_fitness_chain_bool(node.values, operators[type(node.op)], tensors)
elif isinstance(node, ast.Compare): # <left> <compare> <right> e.g., a > z
return self.fx_fitness_chain_compare([node.left] + node.comparators, node.ops, tensors)
else: raise TypeError(node)
def fx_fitness_labels_map(self, result):
'''
For the CLASSIFY kernel, creates a TensorFlow (TF) sub-graph defined as a sequence of boolean conditions based upon
the quantity of true class labels provided in the data .csv. Outputs an array of tuples containing the predicted
labels based upon the result and corresponding boolean condition triggered.
For comparison, the original (pre-TensorFlow) cod follows:
skew = (self.class_labels / 2) - 1 # '-1' keeps a binary classification splitting over the origin
if solution == 0 and result <= 0 - skew; fitness = 1: # check for first class (the left-most bin)
elif solution == self.class_labels - 1 and result > solution - 1 - skew; fitness = 1: # check for last class (the right-most bin)
elif solution - 1 - skew < result <= solution - skew; fitness = 1: # check for class bins between first and last
else: fitness = 0 # no class match
Called by: fx_fitness_eval
Arguments required: result
'''
skew = (self.class_labels / 2) - 1
label_rules = {self.class_labels - 1: (tf.constant(self.class_labels - 1), tf.constant(' > {}'.format(self.class_labels - 2 - skew)))}
for class_label in range(self.class_labels - 2, 0, -1):
cond = (class_label - 1 - skew < result) & (result <= class_label - skew)
label_rules[class_label] = tf.cond(cond, lambda: (tf.constant(class_label), tf.constant(' <= {}'.format(class_label - skew))), lambda: label_rules[class_label + 1])
pred_label = tf.cond(result <= 0 - skew, lambda: (tf.constant(0), tf.constant(' <= {}'.format(0 - skew))), lambda: label_rules[1])
return pred_label
def fx_fitness_store(self, tree, fitness):
'''
Records the fitness and length of the raw algorithm (multivariate expression) to the Numpy array. Parsimony can
be used to apply pressure to the evolutionary process to select from a set of trees with the same fitness function
the one(s) with the simplest (shortest) multivariate expression.
Called by: fx_fitness_gym
Arguments required: tree, fitness
'''
fitness = float(fitness)
fitness = round(fitness, self.precision)
tree[12][1] = fitness # store the fitness with each tree
tree[12][2] = len(str(self.algo_raw)) # store the length of the raw algo for parsimony
# if len(tree[3]) > 4: # if the Tree array is wide enough -- SEE SCRATCHPAD
return
def fx_fitness_tournament(self, tourn_size):
'''
Multiple contenders ('tourn_size') are randomly selected and then compared for their respective fitness, as
determined in fx_fitness_gym(). The tournament is engaged to select a single Tree for each invocation of the
genetic operators: reproduction, mutation (point, branch), and crossover (sexual reproduction).
The original Tournament Selection drew directly from the foundation generation (gp.generation_a). However,
with the introduction of a minimum number of nodes as defined by the user ('gp.tree_depth_min'),
'gp.gene_pool' limits the Trees to those which meet all criteria.
Stronger boundary parameters (a reduced gap bet
Yes, understood. If you will be so kind as to send to, I will test them against the current build. Will save me the time of duplicating your update. Thanks!
On 10/13/19 11:15 AM, ajawfi wrote:
The changes I made are in the "karoo_gp_base_class.py" file. I am really not sure if the changes will break the support of the older versions of TF.
On Sun, Oct 13, 2019 at 11:04 AM Kai Staats notifications@github.com wrote:
Ah! Great! Thank you!
If you don't mind sharing those changes with me directly (kai at over the sun dot com), or forking Karoo, then I will review and merge with credit to your effort. Do you know if the changes break support for the prior versions of TF?
kai
On 10/13/19 9:54 AM, ajawfi wrote:
Thanks, I actually already fixed for me. I just wanted to let you know. As you said, you just need to change couple of the TF attributes (I think I changed around 5 attributes to get it working). Great package by the way. Keep up the good work !
Sent from my iPhone
On Oct 13, 2019, at 10:21 AM, Kai Staats notifications@github.com wrote:
Ok. Looks like the means by which we define our maths functions are no longer supported, or have changed. Should be easy to fix. We will look into this ASAP ... stay tuned.
On 10/13/19 9:13 AM, ajawfi wrote:
I’m using the latest version of TF, version 2.0.
On Oct 13, 2019, at 9:08 AM, Kai Staats notifications@github.com wrote:
Which version of TF are you using?
On 10/12/19 7:22 PM, ajawfi wrote:
I get the following error when trying to run the package, it seems that the version of tensorflow used in the package is an older version:
File "modules\karoo_gp_base_class.py", line 59, in
'log': tf.log, # e.g., log(a) AttributeError: module 'tensorflow' has no attribute 'log' — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kstaats/karoo_gp/issues/17?email_source=notifications&email_token=AMQMZW25GG52S33WQVL7TMTQONILJA5CNFSM4JAFELLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBC2TRY#issuecomment-541436359, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMQMZWYCIPYXP3NRRMCSNA3QONILJANCNFSM4JAFELLA .
Karoo GP Base Class
Define the methods and global variables used by Karoo GP
by Kai Staats, MSc with TensorFlow support provided by Iurii Milovanov; see LICENSE.md
version 2.3 for Python 3.6
''' A NOTE TO THE NEWBIE, EXPERT, AND BRAVE Even if you are highly experienced in Genetic Programming, it is recommended that you review the 'Karoo User Guide' before running this application. While your computer will not burst into flames nor will the sun collapse into a black hole if you do not, you will likely find more enjoyment of this particular flavour of GP with a little understanding of its intent and design. '''
import sys import os import csv import time
import numpy as np import sklearn.metrics as skm
import sklearn.cross_validation as skcv # Python 2.7
import sklearn.model_selection as skcv
from sympy import sympify from datetime import datetime from collections import OrderedDict
import karoo_gp_pause as menu
np.random.seed(1000) # for reproducibility
TensorFlow Imports and Definitions
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "1"
import tensorflow as tf import ast import operator as op
operators = {ast.Add: tf.add, # e.g., a + b ast.Sub: tf.subtract, # e.g., a - b ast.Mult: tf.multiply, # e.g., a * b ast.Div: tf.divide, # e.g., a / b ast.Pow: tf.pow, # e.g., a ** 2 ast.USub: tf.negative, # e.g., -a ast.And: tf.logical_and, # e.g., a and b ast.Or: tf.logical_or, # e.g., a or b ast.Not: tf.logical_not, # e.g., not a ast.Eq: tf.equal, # e.g., a == b ast.NotEq: tf.not_equal, # e.g., a != b ast.Lt: tf.less, # e.g., a < b ast.LtE: tf.less_equal, # e.g., a <= b ast.Gt: tf.greater, # e.g., a > b ast.GtE: tf.greater_equal, # e.g., a >= 1 'abs': tf.abs, # e.g., abs(a) 'sign': tf.sign, # e.g., sign(a) 'square': tf.square, # e.g., square(a) 'sqrt': tf.sqrt, # e.g., sqrt(a) 'pow': tf.pow, # e.g., pow(a, b) 'log': tf.math.log, # e.g., log(a) 'log1p': tf.math.log1p, # e.g., log1p(a) 'cos': tf.cos, # e.g., cos(a) 'sin': tf.sin, # e.g., sin(a) 'tan': tf.tan, # e.g., tan(a) 'acos': tf.acos, # e.g., acos(a) 'asin': tf.asin, # e.g., asin(a) 'atan': tf.atan, # e.g., atan(a) }
np.set_printoptions(linewidth = 320) # set the terminal to print 320 characters before line-wrapping in order to view Trees
class Base_GP(object):
''' This BaseBP class contains all methods for Karoo GP. Method names are differentiated from global variable names (defined below) by the prefix 'fx' followed by an object and action, as in fx_display_tree(), with a few expections, such as fx_fitness_gene_pool().
The method categories (denoted by +++ banners +++) are as follows: fxkaroo Methods to Run Karoo GP fxdata Methods to Load and Archive Data fxinit Methods to Construct the 1st Generation fxeval Methods to Evaluate a Tree fxfitness Methods to Train and Test a Tree for Fitness fxnextgen Methods to Construct the next Generation fxevolve Methods to Evolve a Population fxdisplay Methods to Visualize a Tree
Error checks are quickly located by searching for 'ERROR!' '''
def init(self):
''' ### Global variables used for data management ### self.data_train store train data for processing in TF self.data_test store test data for processing in TF self.tf_device set TF computation backend device (CPU or GPU) self.tf_device_log employed for TensorFlow debugging self.data_train_cols number of cols in the TRAINING data - see fx_data_load() self.data_train_rows number of rows in the TRAINING data - see fx_data_load() self.data_test_cols number of cols in the TEST data - see fx_data_load() self.data_test_rows number of rows in the TEST data - see fx_data_load() self.functions user defined functions (operators) from the associated files/[functions].csv self.terminals user defined variables (operands) from the top row of the associated [data].csv self.coeff user defined coefficients (NOT YET IN USE) self.fitness_type fitness type self.datetime date-time stamp of when the unique directory is created self.path full path to the unique directory created with each run self.dataset local path and dataset filename ### Global variables used for evolutionary management ### self.population_a the root generation from which Trees are chosen for mutation and reproduction self.population_b the generation constructed from gp.population_a (recyled) self.gene_pool once-per-generation assessment of trees that meet min and max boundary conditions self.gen_id simple n + 1 increment self.fitness_type set in fx_data_load() as either a minimising or maximising function self.tree axis-1, 13 element Numpy array that defines each Tree, stored in 'gp.population' self.pop_* 13 variables that define each Tree - see fx_init_tree_initialise() ''' self.algo_raw = [] # the raw expression generated by Sympy per Tree -- CONSIDER MAKING THIS VARIABLE LOCAL self.algo_sym = [] # the expression generated by Sympy per Tree -- CONSIDER MAKING THIS VARIABLE LOCAL self.fittest_dict = {} # all Trees which share the best fitness score self.gene_pool = [] # store all Tree IDs for use by Tournament self.class_labels = 0 # the number of true class labels (data_y) return
+++++++++++++++++++++++++++++++++++++++++++++
Methods to Run Karoo GP |
+++++++++++++++++++++++++++++++++++++++++++++
def fx_karoo_gp(self, kernel, tree_type, tree_depth_base, tree_depth_max, tree_depth_min, tree_pop_max, gen_max, tourn_size, filename, evolve_repro, evolve_point, evolve_branch, evolve_cross, display, precision, swim, mode):
''' This method enables the engagement of the entire Karoo GP application. Instead of returning the user to the pause menu, this script terminates at the command-line, providing support for bash and chron job execution. Calld by: user script karoo_gp.py Arguments required: (see below) ''' ### PART 1 - set global variables to those local values passed from the user script ### self.kernel = kernel # fitness function # tree_type is passed between methods to construct specific trees # tree_depth_base is passed between methods to construct specific trees self.tree_depth_max = tree_depth_max # maximum Tree depth for the entire run; limits bloat self.tree_depth_min = tree_depth_min # minimum number of nodes self.tree_pop_max = tree_pop_max # maximum number of Trees per generation self.gen_max = gen_max # maximum number of generations self.tourn_size = tourn_size # number of Trees selected for each tournament # filename is passed between methods to work with specific populations self.evolve_repro = evolve_repro # quantity of a population generated through Reproduction self.evolve_point = evolve_point # quantity of a population generated through Point Mutation self.evolve_branch = evolve_branch # quantity of a population generated through Branch Mutation self.evolve_cross = evolve_cross # quantity of a population generated through Crossover self.display = display # display mode is set to (s)ilent # level of on-screen feedback self.precision = precision # the number of floating points for the round function in 'fx_fitness_eval' self.swim = swim # pass along the gene_pool restriction methodology # mode is engaged at the end of the run, below ### PART 2 - construct first generation of Trees ### self.fx_data_load(filename) self.gen_id = 1 # set initial generation ID self.population_a = ['Karoo GP by Kai Staats, Generation ' + str(self.gen_id)] # initialise population_a to host the first generation self.population_b = ['placeholder'] # initialise population_b to satisfy fx_karoo_pause() self.fx_init_construct(tree_type, tree_depth_base) # construct the first population of Trees if self.kernel == 'p': # terminate here for Play mode self.fx_display_tree(self.tree) # print the current Tree self.fx_data_tree_write(self.population_a, 'a') # save this one Tree to disk sys.exit() elif self.gen_max == 1: # terminate here if constructing just one generation self.fx_data_tree_write(self.population_a, 'a') # save this single population to disk print ('\n We have constructed a single, stochastic population of', self.tree_pop_max,'Trees, and saved to disk') sys.exit() else: print ('\n We have constructed the first, stochastic population of', self.tree_pop_max,'Trees') ### PART 3 - evaluate first generation of Trees ### print ('\n Evaluate the first generation of Trees ...') self.fx_fitness_gym(self.population_a) # generate expression, evaluate fitness, compare fitness self.fx_data_tree_write(self.population_a, 'a') # save the first generation of Trees to disk ### PART 4 - evolve multiple generations of Trees ### menu = 1 while menu != 0: # this allows the user to add generations mid-run and not get buried in nested iterations for self.gen_id in range(self.gen_id + 1, self.gen_max + 1): # evolve additional generations of Trees print ('\n Evolve a population of Trees for Generation', self.gen_id, '...') self.population_b = ['Karoo GP by Kai Staats - Evolving Generation'] # initialise population_b to host the next generation self.fx_fitness_gene_pool() # generate the viable gene pool (compares against gp.tree_depth_min) self.fx_nextgen_reproduce() # method 1 - Reproduction self.fx_nextgen_point_mutate() # method 2 - Point Mutation self.fx_nextgen_branch_mutate() # method 3 - Branch Mutation self.fx_nextgen_crossover() # method 4 - Crossover self.fx_eval_generation() # evaluate all Trees in a single generation self.population_a = self.fx_evolve_pop_copy(self.population_b, ['Karoo GP by Kai Staats - Generation ' + str(self.gen_id)]) if mode == 's': menu = 0 # (s)erver mode - termination with completiont of prescribed run else: # (d)esktop mode - user is given an option to quit, review, and/or modify parameters; 'add' generations continues the run print ('\n\t\033[32m Enter \033[1m?\033[0;0m\033[32m to review your options or \033[1mq\033[0;0m\033[32muit\033[0;0m') menu = self.fx_karoo_pause() self.fx_karoo_terminate() # archive populations and return to karoo_gp.py for a clean exit return
def fx_karoo_pause_refer(self):
''' Enables (g)eneration, (i)nteractive, and (d)e(b)ug display modes to offer the (pause) menu at each prompt. See fx_karoo_pause() for an explanation of the value being passed. Called by: the functions called by PART 4 of fx_karoo_gp() Arguments required: none ''' menu = 1 while menu == 1: menu = self.fx_karoo_pause() return
def fx_karoo_pause(self):
''' Pause the program execution and engage the user, providing a number of options. Called by: fx_karoo_pause_refer Arguments required: [0,1,2] where (0) refers to an end-of-run; (1) refers to any use of the (pause) menu from within the run, and anticipates ENTER as an escape from the menu to continue the run; and (2) refers to an 'ERROR!' for which the user may want to archive data before terminating. At this point in time, (2) is associated with each error but does not provide any special options). ''' ### PART 1 - reset and pack values to send to menu.pause ### menu_dict = {'input_a':'', 'input_b':0, 'display':self.display, 'tree_depth_max':self.tree_depth_max, 'tree_depth_min':self.tree_depth_min, 'tree_pop_max':self.tree_pop_max, 'gen_id':self.gen_id, 'gen_max':self.gen_max, 'tourn_size':self.tourn_size, 'evolve_repro':self.evolve_repro, 'evolve_point':self.evolve_point, 'evolve_branch':self.evolve_branch, 'evolve_cross':self.evolve_cross, 'fittest_dict':self.fittest_dict, 'pop_a_len':len(self.population_a), 'pop_b_len':len(self.population_b), 'path':self.path} menu_dict = menu.pause(menu_dict) # call the external function menu.pause ### PART 2 - unpack values returned from menu.pause ### input_a = menu_dict['input_a'] input_b = menu_dict['input_b'] self.display = menu_dict['display'] self.tree_depth_min = menu_dict['tree_depth_min'] self.gen_max = menu_dict['gen_max'] self.tourn_size = menu_dict['tourn_size'] self.evolve_repro = menu_dict['evolve_repro'] self.evolve_point = menu_dict['evolve_point'] self.evolve_branch = menu_dict['evolve_branch'] self.evolve_cross = menu_dict['evolve_cross'] ### PART 3 - execute the user queries returned from menu.pause ### if input_a == 'esc': return 2 # breaks out of the fx_karoo_gp() or fx_karoo_pause_refer() loop elif input_a == 'eval': # evaluate a Tree against the TEST data self.fx_eval_poly(self.population_b[input_b]) # generate the raw and sympified expression for the given Tree using SymPy #print ('\n\t\033[36mTree', input_b, 'yields (raw):', self.algo_raw, '\033[0;0m') # print the raw expression print ('\n\t\033[36mTree', input_b, 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m') # print the sympified expression result = self.fx_fitness_eval(str(self.algo_sym), self.data_test, get_pred_labels = True) # might change to algo_raw evaluation if self.kernel == 'c': self.fx_fitness_test_classify(result) # TF tested 2017 02/02 elif self.kernel == 'r': self.fx_fitness_test_regress(result) elif self.kernel == 'm': self.fx_fitness_test_match(result) # elif self.kernel == '[other]': # use others as a template elif input_a == 'print_a': # print a Tree from population_a self.fx_display_tree(self.population_a[input_b]) elif input_a == 'print_b': # print a Tree from population_b self.fx_display_tree(self.population_b[input_b]) elif input_a == 'pop_a': # list all Trees in population_a print ('') for tree_id in range(1, len(self.population_a)): self.fx_eval_poly(self.population_a[tree_id]) # extract the expression print ('\t\033[36m Tree', self.population_a[tree_id][0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m') elif input_a == 'pop_b': # list all Trees in population_b print ('') for tree_id in range(1, len(self.population_b)): self.fx_eval_poly(self.population_b[tree_id]) # extract the expression print ('\t\033[36m Tree', self.population_b[tree_id][0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m') elif input_a == 'load': # load population_s to replace population_a self.fx_data_recover(self.filename['s']) # NEED TO replace 's' with a user defined filename elif input_a == 'write': # write the evolving population_b to disk self.fx_data_tree_write(self.population_b, 'b') print ('\n\t All current members of the evolving population_b saved to karoo_gp/runs/[date-time]/population_b.csv') elif input_a == 'add': # check for added generations, then exit fx_karoo_pause and continue the run self.gen_max = self.gen_max + input_b # if input_b > 0: self.gen_max = self.gen_max + input_b - REMOVED 2019 06/05 elif input_a == 'quit': self.fx_karoo_terminate() # archive populations and exit return 1
def fx_karoo_terminate(self): ''' Terminates the evolutionary run (if yet in progress), saves parameters and data to disk, and cleanly returns the user to karoo_gp.py and the command line.
Called by: fx_karoo_gp() and fx_karoo_pause_refer() Arguments required: none ''' self.fx_data_params_write() target = open(self.filename['f'], 'w'); target.close() # initialize the .csv file for the final population self.fx_data_tree_write(self.population_b, 'f') # save the final generation of Trees to disk print ('\n\t\033[32m Your Trees and runtime parameters are archived in karoo_gp/runs/[date-time]/\033[0;0m') print ('\n\033[3m "It is not the strongest of the species that survive, nor the most intelligent,\033[0;0m') print ('\033[3m but the one most responsive to change."\033[0;0m --Charles Darwin\n') print ('\033[3m Congrats!\033[0;0m Your Karoo GP run is complete.\n') sys.exit() return
+++++++++++++++++++++++++++++++++++++++++++++
Methods to Load and Archive Data |
+++++++++++++++++++++++++++++++++++++++++++++
def fx_data_load(self, filename):
''' The data and function .csv files are loaded according to the fitness function kernel selected by the user. An alternative dataset may be loaded at launch, by appending a command line argument. The data is then split into both TRAINING and TEST segments in order to validate the success of the GP training run. Datasets less than 10 rows will not be split, rather copied in full to both TRAINING and TEST as it is assumed you are conducting a system validation run, as with the built-in MATCH kernel and associated dataset. Called by: fx_karoo_gp Arguments required: filename (of the dataset) ''' ### PART 1 - load the associated data set, operators, operands, fitness type, and coefficients ### # full_path = os.path.realpath(__file__); cwd = os.path.dirname(full_path) # for user Marco Cavaglia cwd = os.getcwd() data_dict = {'c':cwd + '/files/data_CLASSIFY.csv', 'r':cwd + '/files/data_REGRESS.csv', 'm':cwd + '/files/data_MATCH.csv', 'p':cwd + '/files/data_PLAY.csv'} if len(sys.argv) == 1: # load data from the default karoo_gp/files/ directory data_x = np.loadtxt(data_dict[self.kernel], skiprows = 1, delimiter = ',', dtype = float); data_x = data_x[:,0:-1] # load all but the right-most column data_y = np.loadtxt(data_dict[self.kernel], skiprows = 1, usecols = (-1,), delimiter = ',', dtype = float) # load only right-most column (class labels) header = open(data_dict[self.kernel],'r') # open file to be read (below) self.dataset = data_dict[self.kernel] # copy the name only elif len(sys.argv) == 2: # load an external data file data_x = np.loadtxt(sys.argv[1], skiprows = 1, delimiter = ',', dtype = float); data_x = data_x[:,0:-1] # load all but the right-most column data_y = np.loadtxt(sys.argv[1], skiprows = 1, usecols = (-1,), delimiter = ',', dtype = float) # load only right-most column (class labels) header = open(sys.argv[1],'r') # open file to be read (below) self.dataset = sys.argv[1] # copy the name only elif len(sys.argv) > 2: # receive filename and additional arguments from karoo_gp.py via argparse data_x = np.loadtxt(filename, skiprows = 1, delimiter = ',', dtype = float); data_x = data_x[:,0:-1] # load all but the right-most column data_y = np.loadtxt(filename, skiprows = 1, usecols = (-1,), delimiter = ',', dtype = float) # load only right-most column (class labels) header = open(filename,'r') # open file to be read (below) self.dataset = filename # copy the name only fitt_dict = {'c':'max', 'r':'min', 'm':'max', 'p':''} self.fitness_type = fitt_dict[self.kernel] # load fitness type func_dict = {'c':cwd + '/files/operators_CLASSIFY.csv', 'r':cwd + '/files/operators_REGRESS.csv', 'm':cwd + '/files/operators_MATCH.csv', 'p':cwd + '/files/operators_PLAY.csv'} self.functions = np.loadtxt(func_dict[self.kernel], delimiter=',', skiprows=1, dtype = str) # load the user defined functions (operators) self.terminals = header.readline().split(','); self.terminals[-1] = self.terminals[-1].replace('\n','') # load the user defined terminals (operands) self.class_labels = len(np.unique(data_y)) # load the user defined true labels for classification or solutions for regression #self.coeff = np.loadtxt(cwd + '/files/coefficients.csv', delimiter=',', skiprows=1, dtype = str) # load the user defined coefficients - NOT USED YET ### PART 2 - from the dataset, extract TRAINING and TEST data ### if len(data_x) < 11: # for small datasets we will not split them into TRAINING and TEST components data_train = np.c_[data_x, data_y] data_test = np.c_[data_x, data_y] else: # if larger than 10, we run the data through the SciKit Learn's 'random split' function x_train, x_test, y_train, y_test = skcv.train_test_split(data_x, data_y, test_size = 0.2) # 80/20 TRAIN/TEST split data_x, data_y = [], [] # clear from memory data_train = np.c_[x_train, y_train] # recombine each row of data with its associated class label (right column) x_train, y_train = [], [] # clear from memory data_test = np.c_[x_test, y_test] # recombine each row of data with its associated class label (right column) x_test, y_test = [], [] # clear from memory self.data_train_cols = len(data_train[0,:]) # qty count self.data_train_rows = len(data_train[:,0]) # qty count self.data_test_cols = len(data_test[0,:]) # qty count self.data_test_rows = len(data_test[:,0]) # qty count ### PART 3 - load TRAINING and TEST data for TensorFlow processing - tested 2017 02/02 self.data_train = data_train # Store train data for processing in TF self.data_test = data_test # Store test data for processing in TF self.tf_device = "/gpu:0" # Set TF computation backend device (CPU or GPU); gpu:n = 1st, 2nd, or ... GPU device self.tf_device_log = False # TF device usage logging (for debugging) ### PART 4 - create a unique directory and initialise all .csv files ### self.datetime = datetime.now().strftime('%Y-%m-%d_%H-%M-%S') self.path = os.path.join(cwd, 'runs/', filename.split('.')[0] + '_' + self.datetime + '/') # generate a unique directory name if not os.path.isdir(self.path): os.makedirs(self.path) # make a unique directory self.filename = {} # a dictionary to hold .csv filenames self.filename.update( {'a':self.path + 'population_a.csv'} ) target = open(self.filename['a'], 'w'); target.close() # initialise a .csv file for population 'a' (foundation) self.filename.update( {'b':self.path + 'population_b.csv'} ) target = open(self.filename['b'], 'w'); target.close() # initialise a .csv file for population 'b' (evolving) self.filename.update( {'f':self.path + 'population_f.csv'} ) target = open(self.filename['f'], 'w'); target.close() # initialise a .csv file for the final population (test) self.filename.update( {'s':self.path + 'population_s.csv'} ) target = open(self.filename['s'], 'w'); target.close() # initialise a .csv file to manually load (seed) return
def fx_data_recover(self, population):
''' This method is used to load a saved population of Trees, as invoked through the (pause) menu where population_r replaces population_a in the karoo_gp/runs/[date-time]/ directory. Called by: fx_karoo_pause Arguments required: population (filename['s']) ''' with open(population, 'rb') as csv_file: target = csv.reader(csv_file, delimiter=',') n = 0 # track row count for row in target: print ('row', row) n = n + 1 if n == 1: pass # skip first empty row elif n == 2: self.population_a = [row] # write header to population_a else: if row == []: self.tree = np.array([[]]) # initialise Tree array else: if self.tree.shape[1] == 0: self.tree = np.append(self.tree, [row], axis = 1) # append first row to Tree else: self.tree = np.append(self.tree, [row], axis = 0) # append subsequent rows to Tree if self.tree.shape[0] == 13: self.population_a.append(self.tree) # append complete Tree to population list print ('\n', self.population_a) return
def fx_data_tree_clean(self, tree):
''' This method aesthetically cleans the Tree array, removing redundant data. Called by: fx_data_tree_append, fx_evolve_branch_copy Arguments required: tree ''' tree[0][2:] = '' # A little clean-up to make things look pretty :) tree[1][2:] = '' # Ignore the man behind the curtain! tree[2][2:] = '' # Yes, I am a bit OCD ... but you *know* you appreciate clean arrays. return tree
def fx_data_tree_append(self, tree):
''' Append Tree array to the foundation Population. Called by: fx_init_construct Arguments required: tree ''' self.fx_data_tree_clean(tree) # clean 'tree' prior to storing self.population_a.append(tree) # append 'tree' to population list return
def fx_data_tree_write(self, population, key):
''' Save population_* to disk. Called by: fx_karoo_gp, fx_eval_generation Arguments required: population, key ''' with open(self.filename[key], 'a') as csv_file: target = csv.writer(csv_file, delimiter=',') if self.gen_id != 1: target.writerows(['']) # empty row before each generation target.writerows([['Karoo GP by Kai Staats', 'Generation:', str(self.gen_id)]]) for tree in range(1, len(population)): target.writerows(['']) # empty row before each Tree for row in range(0, 13): # increment through each row in the array Tree target.writerows([population[tree][row]]) return
def fx_data_params_write(self): # tested 2017 02/13; argument 'app' removed to simplify termination 2019 06/08
''' Save run-time configuration parameters to disk. Called by: fx_karoo_gp, fx_karoo_pause Arguments required: app ''' file = open(self.path + 'log_config.txt', 'w') file.write('Karoo GP') file.write('\n launched: ' + str(self.datetime)) file.write('\n dataset: ' + str(self.dataset)) file.write('\n') file.write('\n kernel: ' + str(self.kernel)) file.write('\n precision: ' + str(self.precision)) file.write('\n') # file.write('tree type: ' + tree_type) # file.write('tree depth base: ' + str(tree_depth_base)) file.write('\n tree depth max: ' + str(self.tree_depth_max)) file.write('\n min node count: ' + str(self.tree_depth_min)) file.write('\n') file.write('\n genetic operator Reproduction: ' + str(self.evolve_repro)) file.write('\n genetic operator Point Mutation: ' + str(self.evolve_point)) file.write('\n genetic operator Branch Mutation: ' + str(self.evolve_branch)) file.write('\n genetic operator Crossover: ' + str(self.evolve_cross)) file.write('\n') file.write('\n tournament size: ' + str(self.tourn_size)) file.write('\n population: ' + str(self.tree_pop_max)) file.write('\n number of generations: ' + str(self.gen_id)) file.write('\n\n') file.close() file = open(self.path + 'log_test.txt', 'w') file.write('Karoo GP') file.write('\n launched: ' + str(self.datetime)) file.write('\n dataset: ' + str(self.dataset)) file.write('\n') if len(self.fittest_dict) > 0: fitness_best = 0 fittest_tree = 0 # revised method, re-evaluating all Trees from stored fitness score for tree_id in range(1, len(self.population_b)): fitness = float(self.population_b[tree_id][12][1]) if self.kernel == 'c': # display best fit Trees for the CLASSIFY kernel if fitness >= fitness_best: # find the Tree with Maximum fitness score fitness_best = fitness; fittest_tree = tree_id # set best fitness Tree elif self.kernel == 'r': # display best fit Trees for the REGRESSION kernel if fitness_best == 0: fitness_best = fitness # set the baseline first time through if fitness <= fitness_best: # find the Tree with Minimum fitness score fitness_best = fitness; fittest_tree = tree_id # set best fitness Tree elif self.kernel == 'm': # display best fit Trees for the MATCH kernel if fitness == self.data_train_rows: # find the Tree with a perfect match for all data rows fitness_best = fitness; fittest_tree = tree_id # set best fitness Tree # elif self.kernel == '[other]': # use others as a template # print ('fitness_best:', fitness_best, 'fittest_tree:', fittest_tree) # test the most fit Tree and write to the .txt log self.fx_eval_poly(self.population_b[int(fittest_tree)]) # generate the raw and sympified expression for the given Tree using SymPy expr = str(self.algo_sym) # get simplified expression and process it by TF - tested 2017 02/02 result = self.fx_fitness_eval(expr, self.data_test, get_pred_labels = True) file.write('\n\n Tree ' + str(fittest_tree) + ' is the most fit, with expression:') file.write('\n\n ' + str(self.algo_sym)) if self.kernel == 'c': file.write('\n\n Classification fitness score: {}'.format(result['fitness'])) file.write('\n\n Precision-Recall report:\n {}'.format(skm.classification_report(result['solution'], result['pred_labels'][0]))) file.write('\n Confusion matrix:\n {}'.format(skm.confusion_matrix(result['solution'], result['pred_labels'][0]))) elif self.kernel == 'r': MSE, fitness = skm.mean_squared_error(result['result'], result['solution']), result['fitness'] file.write('\n\n Regression fitness score: {}'.format(fitness)) file.write('\n Mean Squared Error: {}'.format(MSE)) elif self.kernel == 'm': file.write('\n\n Matching fitness score: {}'.format(result['fitness'])) # elif self.kernel == '[other]': # use others as a template else: file.write('\n\n There were no evolved solutions generated in this run... your species has gone extinct!') file.write('\n\n') file.close() return
+++++++++++++++++++++++++++++++++++++++++++++
Methods to Construct the 1st Generation |
+++++++++++++++++++++++++++++++++++++++++++++
def fx_init_construct(self, tree_type, tree_depth_base):
''' This method constructs the initial population of Tree type 'tree_type' and of the size tree_depth_base. The Tree can be Full, Grow, or "Ramped Half/Half" as defined by John Koza. Called by: fx_karoo_gp Arguments required: tree_type, tree_depth_base ''' if self.display == 'i': print ('\n\t\033[32m Press \033[36m\033[1m?\033[0;0m\033[32m at any \033[36m\033[1m(pause)\033[0;0m\033[32m, or \033[36m\033[1mENTER\033[0;0m \033[32mto continue the run\033[0;0m'); self.fx_karoo_pause_refer() if tree_type == 'r': # Ramped 50/50 TREE_ID = 1 for n in range(1, int((self.tree_pop_max / 2) / tree_depth_base) + 1): # split the population into equal parts for depth in range(1, tree_depth_base + 1): # build 2 Trees at each depth self.fx_init_tree_build(TREE_ID, 'f', depth) # build a Full Tree self.fx_data_tree_append(self.tree) # append Tree to the list 'gp.population_a' TREE_ID = TREE_ID + 1 self.fx_init_tree_build(TREE_ID, 'g', depth) # build a Grow Tree self.fx_data_tree_append(self.tree) # append Tree to the list 'gp.population_a' TREE_ID = TREE_ID + 1 if TREE_ID < self.tree_pop_max: # eg: split 100 by 2*3 and it will produce only 96 Trees ... for n in range(self.tree_pop_max - TREE_ID + 1): # ... so we complete the run self.fx_init_tree_build(TREE_ID, 'g', tree_depth_base) self.fx_data_tree_append(self.tree) TREE_ID = TREE_ID + 1 else: pass else: # Full or Grow for TREE_ID in range(1, self.tree_pop_max + 1): self.fx_init_tree_build(TREE_ID, tree_type, tree_depth_base) # build the 1st generation of Trees self.fx_data_tree_append(self.tree) return
def fx_init_tree_build(self, TREE_ID, tree_type, tree_depth_base):
''' This method combines 4 sub-methods into a single method for ease of deployment. It is designed to executed within a loop such that an entire population is built. However, it may also be run from the command line, passing a single TREE_ID to the method. 'tree_type' is either (f)ull or (g)row. Note, however, that when the user selects 'ramped 50/50' at launch, it is still (f) or (g) which are passed to this method. Called by: fx_init_construct, fx_evolve_crossover, fx_evolve_grow_mutate Arguments required: TREE_ID, tree_type, tree_depth_base ''' self.fx_init_tree_initialise(TREE_ID, tree_type, tree_depth_base) # initialise a new Tree self.fx_init_root_build() # build the Root node self.fx_init_function_build() # build the Function nodes self.fx_init_terminal_build() # build the Terminal nodes return # each Tree is written to 'gp.tree'
def fx_init_tree_initialise(self, TREE_ID, tree_type, tree_depth_base):
''' Assign 13 global variables to the array 'tree'. Build the array 'tree' with 13 rows and initally, just 1 column of labels. This array will grow horizontally as each new node is appended. The values of this array are stored as string characters, numbers forced to integers at the point of execution. Use of the debug (db) interface mode enables the user to watch the genetic operations as they work on the Trees. Called by: fx_init_tree_build Arguments required: TREE_ID, tree_type, tree_depth_base ''' self.pop_TREE_ID = TREE_ID # pos 0: a unique identifier for each tree self.pop_tree_type = tree_type # pos 1: a global constant based upon the initial user setting self.pop_tree_depth_base = tree_depth_base # pos 2: a global variable which conveys 'tree_depth_base' as unique to each new Tree self.pop_NODE_ID = 1 # pos 3: unique identifier for each node; this is the INDEX KEY to this array self.pop_node_depth = 0 # pos 4: depth of each node when committed to the array self.pop_node_type = '' # pos 5: root, function, or terminal self.pop_node_label = '' # pos 6: operator [+, -, *, ...] or terminal [a, b, c, ...] self.pop_node_parent = '' # pos 7: parent node self.pop_node_arity = '' # pos 8: number of nodes attached to each non-terminal node self.pop_node_c1 = '' # pos 9: child node 1 self.pop_node_c2 = '' # pos 10: child node 2 self.pop_node_c3 = '' # pos 11: child node 3 (assumed max of 3 with boolean operator 'if') self.pop_fitness = '' # pos 12: fitness score following Tree evaluation self.tree = np.array([ ['TREE_ID'],['tree_type'],['tree_depth_base'],['NODE_ID'],['node_depth'],['node_type'],['node_label'],['node_parent'],['node_arity'],['node_c1'],['node_c2'],['node_c3'],['fitness'] ]) return
Root Node
def fx_init_root_build(self):
''' Build the Root node for the initial population. Called by: fx_init_tree_build Arguments required: none ''' self.fx_init_function_select() # select the operator for root if self.pop_node_arity == 1: # 1 child self.pop_node_c1 = 2 self.pop_node_c2 = '' self.pop_node_c3 = '' elif self.pop_node_arity == 2: # 2 children self.pop_node_c1 = 2 self.pop_node_c2 = 3 self.pop_node_c3 = '' elif self.pop_node_arity == 3: # 3 children self.pop_node_c1 = 2 self.pop_node_c2 = 3 self.pop_node_c3 = 4 else: print ('\n\t\033[31m ERROR! In fx_init_root_build: pop_node_arity =', self.pop_node_arity, '\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08 self.pop_node_type = 'root' self.fx_init_node_commit() return
Function Nodes
def fx_init_function_build(self):
''' Build the Function nodes for the intial population. Called by: fx_init_tree_build Arguments required: none ''' for i in range(1, self.pop_tree_depth_base): # increment depth, from 1 through 'tree_depth_base' - 1 self.pop_node_depth = i # increment 'node_depth' parent_arity_sum = 0 prior_sibling_arity = 0 # reset for 'c_buffer' in 'children_link' prior_siblings = 0 # reset for 'c_buffer' in 'children_link' for j in range(1, len(self.tree[3])): # increment through all nodes (exclude 0) in array 'tree' if int(self.tree[4][j]) == self.pop_node_depth - 1: # find parent nodes which reside at the prior depth parent_arity_sum = parent_arity_sum + int(self.tree[8][j]) # sum arities of all parent nodes at the prior depth # (do *not* merge these 2 "j" loops or it gets all kinds of messed up) for j in range(1, len(self.tree[3])): # increment through all nodes (exclude 0) in array 'tree' if int(self.tree[4][j]) == self.pop_node_depth - 1: # find parent nodes which reside at the prior depth for k in range(1, int(self.tree[8][j]) + 1): # increment through each degree of arity for each parent node self.pop_node_parent = int(self.tree[3][j]) # set the parent 'NODE_ID' ... prior_sibling_arity = self.fx_init_function_gen(parent_arity_sum, prior_sibling_arity, prior_siblings) # ... generate a Function ndoe prior_siblings = prior_siblings + 1 # sum sibling nodes (current depth) who will spawn their own children (cousins? :) return
def fx_init_function_gen(self, parent_arity_sum, prior_sibling_arity, prior_siblings):
''' Generate a single Function node for the initial population. Called by fx_init_function_build Arguments required: parent_arity_sum, prior_sibling_arity, prior_siblings ''' if self.pop_tree_type == 'f': # user defined as (f)ull self.fx_init_function_select() # retrieve a function self.fx_init_child_link(parent_arity_sum, prior_sibling_arity, prior_siblings) # establish links to children elif self.pop_tree_type == 'g': # user defined as (g)row rnd = np.random.randint(2) if rnd == 0: # randomly selected as Function self.fx_init_function_select() # retrieve a function self.fx_init_child_link(parent_arity_sum, prior_sibling_arity, prior_siblings) # establish links to children elif rnd == 1: # randomly selected as Terminal self.fx_init_terminal_select() # retrieve a terminal self.pop_node_c1 = '' self.pop_node_c2 = '' self.pop_node_c3 = '' self.fx_init_node_commit() # commit new node to array prior_sibling_arity = prior_sibling_arity + self.pop_node_arity # sum the arity of prior siblings return prior_sibling_arity
def fx_init_function_select(self):
''' Define a single Function (operator extracted from the associated functions.csv) for the initial population. Called by: fx_init_function_gen, fx_init_root_build Arguments required: none ''' self.pop_node_type = 'func' rnd = np.random.randint(0, len(self.functions[:,0])) # call the previously loaded .csv which contains all operators self.pop_node_label = self.functions[rnd][0] self.pop_node_arity = int(self.functions[rnd][1]) return
Terminal Nodes
def fx_init_terminal_build(self):
''' Build the Terminal nodes for the intial population. Called by: fx_init_tree_build Arguments required: none ''' self.pop_node_depth = self.pop_tree_depth_base # set the final node_depth (same as 'gp.pop_node_depth' + 1) for j in range(1, len(self.tree[3]) ): # increment through all nodes (exclude 0) in array 'tree' if int(self.tree[4][j]) == self.pop_node_depth - 1: # find parent nodes which reside at the prior depth for k in range(1,(int(self.tree[8][j]) + 1)): # increment through each degree of arity for each parent node self.pop_node_parent = int(self.tree[3][j]) # set the parent 'NODE_ID' ... self.fx_init_terminal_gen() # ... generate a Terminal node return
def fx_init_terminal_gen(self):
''' Generate a single Terminal node for the initial population. Called by: fx_init_terminal_build Arguments required: none ''' self.fx_init_terminal_select() # retrieve a terminal self.pop_node_c1 = '' self.pop_node_c2 = '' self.pop_node_c3 = '' self.fx_init_node_commit() # commit new node to array return
def fx_init_terminal_select(self):
''' Define a single Terminal (variable extracted from the top row of the associated TRAINING data) Called by: fx_init_terminal_gen, fx_init_function_gen Arguments required: none ''' self.pop_node_type = 'term' rnd = np.random.randint(0, len(self.terminals) - 1) # call the previously loaded .csv which contains all terminals self.pop_node_label = self.terminals[rnd] self.pop_node_arity = 0 return
The Lovely Children
def fx_init_child_link(self, parent_arity_sum, prior_sibling_arity, prior_siblings):
''' Link each parent node to its children in the intial population. Called by: fx_init_function_gen Arguments required: parent_arity_sum, prior_sibling_arity, prior_siblings ''' c_buffer = 0 for n in range(1, len(self.tree[3]) ): # increment through all nodes (exclude 0) in array 'tree' if int(self.tree[4][n]) == self.pop_node_depth - 1: # find all nodes that reside at the prior (parent) 'node_depth' c_buffer = self.pop_NODE_ID + (parent_arity_sum + prior_sibling_arity - prior_siblings) # One algo to rule the world! if self.pop_node_arity == 0: # terminal in a Grow Tree self.pop_node_c1 = '' self.pop_node_c2 = '' self.pop_node_c3 = '' elif self.pop_node_arity == 1: # 1 child self.pop_node_c1 = c_buffer self.pop_node_c2 = '' self.pop_node_c3 = '' elif self.pop_node_arity == 2: # 2 children self.pop_node_c1 = c_buffer self.pop_node_c2 = c_buffer + 1 self.pop_node_c3 = '' elif self.pop_node_arity == 3: # 3 children self.pop_node_c1 = c_buffer self.pop_node_c2 = c_buffer + 1 self.pop_node_c3 = c_buffer + 2 else: print ('\n\t\033[31m ERROR! In fx_init_child_link: pop_node_arity =', self.pop_node_arity, '\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08 return
def fx_init_node_commit(self):
''' Commit the values of a new node (root, function, or terminal) to the array 'tree'. Called by: fx_init_root_build, fx_init_function_gen, fx_init_terminal_gen Arguments required: none ''' self.tree = np.append(self.tree, [ [self.pop_TREE_ID],[self.pop_tree_type],[self.pop_tree_depth_base],[self.pop_NODE_ID],[self.pop_node_depth],[self.pop_node_type],[self.pop_node_label],[self.pop_node_parent],[self.pop_node_arity],[self.pop_node_c1],[self.pop_node_c2],[self.pop_node_c3],[self.pop_fitness] ], 1) self.pop_NODE_ID = self.pop_NODE_ID + 1 return
+++++++++++++++++++++++++++++++++++++++++++++
Methods to Evaluate a Tree |
+++++++++++++++++++++++++++++++++++++++++++++
def fx_eval_poly(self, tree):
''' Evaluate a Tree and generate its multivariate expression (both raw and Sympified). We need to extract the variables from the expression. However, these variables are no longer correlated to the original variables listed across the top of each column of data.csv. Therefore, we must re-assign the respective values for each subsequent row in the data .csv, for each Tree's unique expression. Called by: fx_karoo_pause, fx_data_params_write, fx_eval_label, fx_fitness_gym, fx_fitness_gene_pool, fx_display_tree Arguments required: tree ''' self.algo_raw = self.fx_eval_label(tree, 1) # pass the root 'node_id', then flatten the Tree to a string self.algo_sym = sympify(self.algo_raw) # convert string to a functional expression (the coolest line in Karoo! :) return
def fx_eval_label(self, tree, node_id):
''' Evaluate all or part of a Tree (starting at node_id) and return a raw mutivariate expression ('algo_raw'). This method is called once per Tree, but may be called at any time to prepare an expression for any full or partial (branch) Tree contained in 'population'. Pass the starting node for recursion via the local variable 'node_id' where the local variable 'tree' is a copy of the Tree you desire to evaluate. Called by: fx_eval_poly, fx_eval_label (recursively) Arguments required: tree, node_id ''' # if tree[6, node_id] == 'not': tree[6, node_id] = ', not' # temp until this can be fixed at data_load node_id = int(node_id) if tree[8, node_id] == '0': # arity of 0 for the pattern '[term]' return '(' + tree[6, node_id] + ')' # 'node_label' (function or terminal) else: if tree[8, node_id] == '1': # arity of 1 for the explicit pattern 'not [term]' return self.fx_eval_label(tree, tree[9, node_id]) + tree[6, node_id] elif tree[8, node_id] == '2': # arity of 2 for the pattern '[func] [term] [func]' return self.fx_eval_label(tree, tree[9, node_id]) + tree[6, node_id] + self.fx_eval_label(tree, tree[10, node_id]) elif tree[8, node_id] == '3': # arity of 3 for the explicit pattern 'if [term] then [term] else [term]' return tree[6, node_id] + self.fx_eval_label(tree, tree[9, node_id]) + ' then ' + self.fx_eval_label(tree, tree[10, node_id]) + ' else ' + self.fx_eval_label(tree, tree[11, node_id])
def fx_eval_id(self, tree, node_id):
''' Evaluate all or part of a Tree and return a list of all 'NODE_ID's. This method generates a list of all 'NODE_ID's from the given Node and below. It is used primarily to generate 'branch' for the multi-generational mutation of Trees. Pass the starting node for recursion via the local variable 'node_id' where the local variable 'tree' is a copy of the Tree you desire to evaluate. Called by: fx_eval_id (recursively), fx_evolve_branch_select Arguments required: tree, node_id ''' node_id = int(node_id) if tree[8, node_id] == '0': # arity of 0 for the pattern '[NODE_ID]' return tree[3, node_id] # 'NODE_ID' else: if tree[8, node_id] == '1': # arity of 1 for the pattern '[NODE_ID], [NODE_ID]' return tree[3, node_id] + ', ' + self.fx_eval_id(tree, tree[9, node_id]) elif tree[8, node_id] == '2': # arity of 2 for the pattern '[NODE_ID], [NODE_ID], [NODE_ID]' return tree[3, node_id] + ', ' + self.fx_eval_id(tree, tree[9, node_id]) + ', ' + self.fx_eval_id(tree, tree[10, node_id]) elif tree[8, node_id] == '3': # arity of 3 for the pattern '[NODE_ID], [NODE_ID], [NODE_ID], [NODE_ID]' return tree[3, node_id] + ', ' + self.fx_eval_id(tree, tree[9, node_id]) + ', ' + self.fx_eval_id(tree, tree[10, node_id]) + ', ' + self.fx_eval_id(tree, tree[11, node_id])
def fx_eval_generation(self):
''' This method invokes the evaluation of an entire generation of Trees. It automatically evaluates population_b before invoking the copy of _b to _a. Called by: fx_karoo_gp Arguments required: none ''' if self.display != 's': if self.display == 'i': print ('') print ('\n Evaluate all Trees in Generation', self.gen_id) if self.display == 'i': self.fx_karoo_pause_refer() # 2019 06/07 for tree_id in range(1, len(self.population_b)): # renumber all Trees in given population - merged fx_evolve_tree_renum 2018 04/12 self.population_b[tree_id][0][1] = tree_id self.fx_fitness_gym(self.population_b) # run fx_eval(), fx_fitness(), fx_fitness_store(), and fitness record self.fx_data_tree_write(self.population_b, 'a') # archive current population as foundation for next generation if self.display != 's': print ('\n Copy gp.population_b to gp.population_a\n') return
+++++++++++++++++++++++++++++++++++++++++++++
Methods to Train and Test a Tree |
+++++++++++++++++++++++++++++++++++++++++++++
def fx_fitness_gym(self, population):
''' Part 1 evaluates each expression against the data, line for line. This is the most time consuming and computationally expensive part of genetic programming. When GPUs are available, the performance can increase by many orders of magnitude for datasets measured in millions of data. Part 2 evaluates every Tree in each generation to determine which have the best, overall fitness score. This could be the highest or lowest depending upon if the fitness function is maximising (higher is better) or minimising (lower is better). The total fitness score is then saved with each Tree in the external .csv file. Part 3 compares the fitness of each Tree to the prior best fit in order to track those that improve with each comparison. For matching functions, all the Trees will have the same fitness score, but they may present more than one solution. For minimisation and maximisation functions, the final Tree should present the best overall fitness for that generation. It is important to note that Part 3 does *not* in any way influence the Tournament Selection which is a stand-alone process. Called by: fx_karoo_gp, fx_eval_generations Arguments required: population ''' fitness_best = 0 self.fittest_dict = {} time_sum = 0 for tree_id in range(1, len(population)): ### PART 1 - GENERATE MULTIVARIATE EXPRESSION FOR EACH TREE ### self.fx_eval_poly(population[tree_id]) # extract the expression if self.display not in ('s'): print ('\t\033[36mTree', population[tree_id][0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m') ### PART 2 - EVALUATE FITNESS FOR EACH TREE AGAINST TRAINING DATA ### fitness = 0 expr = str(self.algo_sym) # get sympified expression and process it with TF - tested 2017 02/02 result = self.fx_fitness_eval(expr, self.data_train) fitness = result['fitness'] # extract fitness score if self.display == 'i': print ('\t \033[36m with fitness sum:\033[1m', fitness, '\033[0;0m\n') self.fx_fitness_store(population[tree_id], fitness) # store Fitness with each Tree ### PART 3 - COMPARE FITNESS OF ALL TREES IN CURRENT GENERATION ### if self.kernel == 'c': # display best fit Trees for the CLASSIFY kernel if fitness >= fitness_best: # find the Tree with Maximum fitness score fitness_best = fitness # set best fitness score self.fittest_dict.update({tree_id:self.algo_sym}) # add to dictionary if fitness >= prior elif self.kernel == 'r': # display best fit Trees for the REGRESSION kernel if fitness_best == 0: fitness_best = fitness # set the baseline first time through if fitness <= fitness_best: # find the Tree with Minimum fitness score fitness_best = fitness # set best fitness score self.fittest_dict.update({tree_id:self.algo_sym}) # add to dictionary if fitness <= prior elif self.kernel == 'm': # display best fit Trees for the MATCH kernel if fitness == self.data_train_rows: # find the Tree with a perfect match for all data rows fitness_best = fitness # set best fitness score self.fittest_dict.update({tree_id:self.algo_sym}) # add to dictionary if all rows match # elif self.kernel == '[other]': # use others as a template print ('\n\033[36m ', len(list(self.fittest_dict.keys())), 'trees\033[1m', np.sort(list(self.fittest_dict.keys())), '\033[0;0m\033[36moffer the highest fitness scores.\033[0;0m') if self.display == 'g': self.fx_karoo_pause_refer() # 2019 06/07 return
def fx_fitness_eval(self, expr, data, get_pred_labels = False):
''' Computes tree expression using TensorFlow (TF) returning results and fitness scores. This method orchestrates most of the TF routines by parsing input string 'expression' and converting it into a TF operation graph which is then processed in an isolated TF session to compute the results and corresponding fitness values. 'self.tf_device' - controls which device will be used for computations (CPU or GPU). 'self.tf_device_log' - controls device placement logging (debug only). Args: 'expr' - a string containing math expression to be computed on the data. Variable names should match corresponding terminal names in 'self.terminals'. 'data' - an 'n by m' matrix of the data points containing n observations and m features per observation. Variable order should match corresponding order of terminals in 'self.terminals'. 'get_pred_labels' - a boolean flag which controls whether the predicted labels should be extracted from the evolved results. This applies only to the CLASSIFY kernel and defaults to 'False'. Returns: A dict mapping keys to the following outputs: 'result' - an array of the results of applying given expression to the data 'pred_labels' - an array of the predicted labels extracted from the results; defined only for CLASSIFY kernel, else None 'solution' - an array of the solution values extracted from the data (variable 's' in the dataset) 'pairwise_fitness' - an array of the element-wise results of applying corresponding fitness kernel function 'fitness' - aggregated scalar fitness score Called by: fx_karoo_pause, fx_data_params_write, fx_fitness_gym Arguments required: expr, data ''' # Initialize TensorFlow session tf.compat.v1.reset_default_graph() # Reset TF internal state and cache (after previous processing) config = tf.compat.v1.ConfigProto(log_device_placement=self.tf_device_log, allow_soft_placement=True) config.gpu_options.allow_growth = True with tf.compat.v1.Session(config=config) as sess: with sess.graph.device(self.tf_device): # 1 - Load data into TF vectors tensors = {} for i in range(len(self.terminals)): var = self.terminals[i] tensors[var] = tf.constant(data[:, i], dtype=tf.float32) # converts data into vectors # 2- Transform string expression into TF operation graph result = self.fx_fitness_expr_parse(expr, tensors) pred_labels = tf.no_op() # a placeholder, applies only to CLASSIFY kernel solution = tensors['s'] # solution value is assumed to be stored in 's' terminal # 3- Add fitness computation into TF graph if self.kernel == 'c': # CLASSIFY kernel ''' Creates element-wise fitness computation TensorFlow (TF) sub-graph for CLASSIFY kernel. This method uses the 'sympified' (SymPy) expression ('algo_sym') created in fx_eval_poly() and the data set loaded at run-time to evaluate the fitness of the selected kernel. This multiclass classifer compares each row of a given Tree to the known solution, comparing predicted labels generated by Karoo GP against the true classs labels. This method is able to work with any number of class labels, from 2 to n. The left-most bin includes -inf. The right-most bin includes +inf. Those inbetween are by default confined to the spacing of 1.0 each, as defined by: (solution - 1) < result <= solution The skew adjusts the boundaries of the bins such that they fall on both the negative and positive sides of the origin. At the time of this writing, an odd number of class labels will generate an extra bin on the positive side of origin as it has not yet been determined the effect of enabling the middle bin to include both a negative and positive result. ''' # was breaking with upgrade from Tensorflow 1.1 to 1.3; fixed by Iurii by replacing [] with () as of 20171026 # if get_pred_labels: pred_labels = tf.map_fn(self.fx_fitness_labels_map, result, dtype = [tf.int32, tf.string], swap_memory = True) if get_pred_labels: pred_labels = tf.map_fn(self.fx_fitness_labels_map, result, dtype = (tf.int32, tf.string), swap_memory = True) skew = (self.class_labels / 2) - 1 rule11 = tf.equal(solution, 0) rule12 = tf.less_equal(result, 0 - skew) rule13 = tf.logical_and(rule11, rule12) rule21 = tf.equal(solution, self.class_labels - 1) rule22 = tf.greater(result, solution - 1 - skew) rule23 = tf.logical_and(rule21, rule22) rule31 = tf.less(solution - 1 - skew, result) rule32 = tf.less_equal(result, solution - skew) rule33 = tf.logical_and(rule31, rule32) pairwise_fitness = tf.cast(tf.logical_or(tf.logical_or(rule13, rule23), rule33), tf.int32) elif self.kernel == 'r': # REGRESSION kernel ''' A very, very basic REGRESSION kernel which is not designed to perform well in the real world. It requires that you raise the minimum node count to keep it from converging on the value of '1'. Consider writing or integrating a more sophisticated kernel. ''' pairwise_fitness = tf.abs(solution - result) elif self.kernel == 'm': # MATCH kernel ''' This is used for demonstration purposes only. ''' # pairwise_fitness = tf.cast(tf.equal(solution, result), tf.int32) # breaks due to floating points RTOL, ATOL = 1e-05, 1e-08 # fixes above issue by checking if a float value lies within a range of values pairwise_fitness = tf.cast(tf.less_equal(tf.abs(solution - result), ATOL + RTOL * tf.abs(result)), tf.int32) # elif self.kernel == '[other]': # use others as a template else: raise Exception('Kernel type is wrong or missing. You entered {}'.format(self.kernel)) fitness = tf.reduce_sum(pairwise_fitness) # Process TF graph and collect the results result, pred_labels, solution, fitness, pairwise_fitness = sess.run([result, pred_labels, solution, fitness, pairwise_fitness]) return {'result': result, 'pred_labels': pred_labels, 'solution': solution, 'fitness': float(fitness), 'pairwise_fitness': pairwise_fitness}
def fx_fitness_expr_parse(self, expr, tensors):
''' Extract expression tree from the string algo_sym and transform into TensorFlow (TF) graph. Called by: fx_fitness_eval Arguments required: expr, tensors ''' tree = ast.parse(expr, mode='eval').body return self.fx_fitness_node_parse(tree, tensors)
def fx_fitness_chain_bool(self, values, operation, tensors):
''' Chains a sequence of boolean operations (e.g. 'a and b and c') into a single TensorFlow (TF) sub graph. Called by: fx_fitness_node_parse Arguments required: values, operation, tensors ''' x = tf.cast(self.fx_fitness_node_parse(values[0], tensors), tf.bool) if len(values) > 1: return operation(x, self.fx_fitness_chain_bool(values[1:], operation, tensors)) else: return x
def fx_fitness_chain_compare(self, comparators, ops, tensors):
''' Chains a sequence of comparison operations (e.g. 'a > b < c') into a single TensorFlow (TF) sub graph. Called by: fx_fitness_node_parse Arguments required: comparators, ops, tensors ''' x = self.fx_fitness_node_parse(comparators[0], tensors) y = self.fx_fitness_node_parse(comparators[1], tensors) if len(comparators) > 2: return tf.logical_and(operators[type(ops[0])](x, y), self.fx_fitness_chain_compare(comparators[1:], ops[1:], tensors)) else: return operators[type(ops[0])](x, y)
def fx_fitness_node_parse(self, node, tensors):
''' Recursively transforms parsed expression tree into TensorFlow (TF) graph. Called by: fx_fitness_expr_parse, fx_fitness_chain_bool, fx_fitness_chain_compare Arguments required: node, tensors ''' if isinstance(node, ast.Name): # <tensor_name> return tensors[node.id] elif isinstance(node, ast.Num): # <number> #shape = tensors[tensors.keys()[0]].get_shape() # Python 2.7 shape = tensors[list(tensors.keys())[0]].get_shape() return tf.constant(node.n, shape=shape, dtype=tf.float32) elif isinstance(node, ast.BinOp): # <left> <operator> <right>, e.g., x + y return operators[type(node.op)](self.fx_fitness_node_parse(node.left, tensors), self.fx_fitness_node_parse(node.right, tensors)) elif isinstance(node, ast.UnaryOp): # <operator> <operand> e.g., -1 return operators[type(node.op)](self.fx_fitness_node_parse(node.operand, tensors)) elif isinstance(node, ast.Call): # <function>(<arguments>) e.g., sin(x) return operators[node.func.id](*[self.fx_fitness_node_parse(arg, tensors) for arg in node.args]) elif isinstance(node, ast.BoolOp): # <left> <bool_operator> <right> e.g. x or y return self.fx_fitness_chain_bool(node.values, operators[type(node.op)], tensors) elif isinstance(node, ast.Compare): # <left> <compare> <right> e.g., a > z return self.fx_fitness_chain_compare([node.left] + node.comparators, node.ops, tensors) else: raise TypeError(node)
def fx_fitness_labels_map(self, result):
''' For the CLASSIFY kernel, creat
I sent you the file in the previous email. Did you receive it ?
Sent from my iPhone
On Oct 13, 2019, at 5:31 PM, Kai Staats notifications@github.com wrote:
Yes, understood. If you will be so kind as to send to, I will test them against the current build. Will save me the time of duplicating your update. Thanks!
On 10/13/19 11:15 AM, ajawfi wrote:
The changes I made are in the "karoo_gp_base_class.py" file. I am really not sure if the changes will break the support of the older versions of TF.
On Sun, Oct 13, 2019 at 11:04 AM Kai Staats notifications@github.com wrote:
Ah! Great! Thank you!
If you don't mind sharing those changes with me directly (kai at over the sun dot com), or forking Karoo, then I will review and merge with credit to your effort. Do you know if the changes break support for the prior versions of TF?
kai
On 10/13/19 9:54 AM, ajawfi wrote:
Thanks, I actually already fixed for me. I just wanted to let you know. As you said, you just need to change couple of the TF attributes (I think I changed around 5 attributes to get it working). Great package by the way. Keep up the good work !
Sent from my iPhone
On Oct 13, 2019, at 10:21 AM, Kai Staats notifications@github.com wrote:
Ok. Looks like the means by which we define our maths functions are no longer supported, or have changed. Should be easy to fix. We will look into this ASAP ... stay tuned.
On 10/13/19 9:13 AM, ajawfi wrote:
I’m using the latest version of TF, version 2.0.
On Oct 13, 2019, at 9:08 AM, Kai Staats notifications@github.com wrote:
Which version of TF are you using?
On 10/12/19 7:22 PM, ajawfi wrote:
I get the following error when trying to run the package, it seems that the version of tensorflow used in the package is an older version:
File "modules\karoo_gp_base_class.py", line 59, in
'log': tf.log, # e.g., log(a) AttributeError: module 'tensorflow' has no attribute 'log' — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kstaats/karoo_gp/issues/17?email_source=notifications&email_token=AMQMZW25GG52S33WQVL7TMTQONILJA5CNFSM4JAFELLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBC2TRY#issuecomment-541436359, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMQMZWYCIPYXP3NRRMCSNA3QONILJANCNFSM4JAFELLA .
Karoo GP Base Class
Define the methods and global variables used by Karoo GP
by Kai Staats, MSc with TensorFlow support provided by Iurii Milovanov; see LICENSE.md
version 2.3 for Python 3.6
''' A NOTE TO THE NEWBIE, EXPERT, AND BRAVE Even if you are highly experienced in Genetic Programming, it is recommended that you review the 'Karoo User Guide' before running this application. While your computer will not burst into flames nor will the sun collapse into a black hole if you do not, you will likely find more enjoyment of this particular flavour of GP with a little understanding of its intent and design. '''
import sys import os import csv import time
import numpy as np import sklearn.metrics as skm
import sklearn.cross_validation as skcv # Python 2.7
import sklearn.model_selection as skcv
from sympy import sympify from datetime import datetime from collections import OrderedDict
import karoo_gp_pause as menu
np.random.seed(1000) # for reproducibility
TensorFlow Imports and Definitions
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "1"
import tensorflow as tf import ast import operator as op
operators = {ast.Add: tf.add, # e.g., a + b ast.Sub: tf.subtract, # e.g., a - b ast.Mult: tf.multiply, # e.g., a * b ast.Div: tf.divide, # e.g., a / b ast.Pow: tf.pow, # e.g., a ** 2 ast.USub: tf.negative, # e.g., -a ast.And: tf.logical_and, # e.g., a and b ast.Or: tf.logical_or, # e.g., a or b ast.Not: tf.logical_not, # e.g., not a ast.Eq: tf.equal, # e.g., a == b ast.NotEq: tf.not_equal, # e.g., a != b ast.Lt: tf.less, # e.g., a < b ast.LtE: tf.less_equal, # e.g., a <= b ast.Gt: tf.greater, # e.g., a > b ast.GtE: tf.greater_equal, # e.g., a >= 1 'abs': tf.abs, # e.g., abs(a) 'sign': tf.sign, # e.g., sign(a) 'square': tf.square, # e.g., square(a) 'sqrt': tf.sqrt, # e.g., sqrt(a) 'pow': tf.pow, # e.g., pow(a, b) 'log': tf.math.log, # e.g., log(a) 'log1p': tf.math.log1p, # e.g., log1p(a) 'cos': tf.cos, # e.g., cos(a) 'sin': tf.sin, # e.g., sin(a) 'tan': tf.tan, # e.g., tan(a) 'acos': tf.acos, # e.g., acos(a) 'asin': tf.asin, # e.g., asin(a) 'atan': tf.atan, # e.g., atan(a) }
np.set_printoptions(linewidth = 320) # set the terminal to print 320 characters before line-wrapping in order to view Trees
class Base_GP(object):
''' This BaseBP class contains all methods for Karoo GP. Method names are differentiated from global variable names (defined below) by the prefix 'fx' followed by an object and action, as in fx_display_tree(), with a few expections, such as fx_fitness_gene_pool().
The method categories (denoted by +++ banners +++) are as follows: fxkaroo Methods to Run Karoo GP fxdata Methods to Load and Archive Data fxinit Methods to Construct the 1st Generation fxeval Methods to Evaluate a Tree fxfitness Methods to Train and Test a Tree for Fitness fxnextgen Methods to Construct the next Generation fxevolve Methods to Evolve a Population fxdisplay Methods to Visualize a Tree
Error checks are quickly located by searching for 'ERROR!' '''
def init(self):
'''
Global variables used for data management
self.data_train store train data for processing in TF self.data_test store test data for processing in TF self.tf_device set TF computation backend device (CPU or GPU) self.tf_device_log employed for TensorFlow debugging
self.data_train_cols number of cols in the TRAINING data - see fx_data_load() self.data_train_rows number of rows in the TRAINING data - see fx_data_load() self.data_test_cols number of cols in the TEST data - see fx_data_load() self.data_test_rows number of rows in the TEST data - see fx_data_load()
self.functions user defined functions (operators) from the associated files/[functions].csv self.terminals user defined variables (operands) from the top row of the associated [data].csv self.coeff user defined coefficients (NOT YET IN USE) self.fitness_type fitness type self.datetime date-time stamp of when the unique directory is created self.path full path to the unique directory created with each run self.dataset local path and dataset filename
Global variables used for evolutionary management
self.population_a the root generation from which Trees are chosen for mutation and reproduction self.population_b the generation constructed from gp.population_a (recyled) self.gene_pool once-per-generation assessment of trees that meet min and max boundary conditions self.gen_id simple n + 1 increment self.fitness_type set in fx_dataload() as either a minimising or maximising function self.tree axis-1, 13 element Numpy array that defines each Tree, stored in 'gp.population' self.pop* 13 variables that define each Tree - see fx_init_tree_initialise() '''
self.algo_raw = [] # the raw expression generated by Sympy per Tree -- CONSIDER MAKING THIS VARIABLE LOCAL self.algo_sym = [] # the expression generated by Sympy per Tree -- CONSIDER MAKING THIS VARIABLE LOCAL self.fittest_dict = {} # all Trees which share the best fitness score self.gene_pool = [] # store all Tree IDs for use by Tournament self.class_labels = 0 # the number of true class labels (data_y)
return
+++++++++++++++++++++++++++++++++++++++++++++
Methods to Run Karoo GP |
+++++++++++++++++++++++++++++++++++++++++++++
def fx_karoo_gp(self, kernel, tree_type, tree_depth_base, tree_depth_max, tree_depth_min, tree_pop_max, gen_max, tourn_size, filename, evolve_repro, evolve_point, evolve_branch, evolve_cross, display, precision, swim, mode):
''' This method enables the engagement of the entire Karoo GP application. Instead of returning the user to the pause menu, this script terminates at the command-line, providing support for bash and chron job execution.
Calld by: user script karoo_gp.py
Arguments required: (see below) '''
PART 1 - set global variables to those local values passed from the user script
self.kernel = kernel # fitness function
tree_type is passed between methods to construct specific trees
tree_depth_base is passed between methods to construct specific trees
self.tree_depth_max = tree_depth_max # maximum Tree depth for the entire run; limits bloat self.tree_depth_min = tree_depth_min # minimum number of nodes self.tree_pop_max = tree_pop_max # maximum number of Trees per generation self.gen_max = gen_max # maximum number of generations self.tourn_size = tourn_size # number of Trees selected for each tournament
filename is passed between methods to work with specific populations
self.evolve_repro = evolve_repro # quantity of a population generated through Reproduction self.evolve_point = evolve_point # quantity of a population generated through Point Mutation self.evolve_branch = evolve_branch # quantity of a population generated through Branch Mutation self.evolve_cross = evolve_cross # quantity of a population generated through Crossover self.display = display # display mode is set to (s)ilent # level of on-screen feedback self.precision = precision # the number of floating points for the round function in 'fx_fitness_eval' self.swim = swim # pass along the gene_pool restriction methodology
mode is engaged at the end of the run, below
PART 2 - construct first generation of Trees
self.fx_data_load(filename) self.gen_id = 1 # set initial generation ID self.population_a = ['Karoo GP by Kai Staats, Generation ' + str(self.gen_id)] # initialise population_a to host the first generation self.population_b = ['placeholder'] # initialise population_b to satisfy fx_karoo_pause() self.fx_init_construct(tree_type, tree_depth_base) # construct the first population of Trees
if self.kernel == 'p': # terminate here for Play mode self.fx_display_tree(self.tree) # print the current Tree self.fx_data_tree_write(self.population_a, 'a') # save this one Tree to disk sys.exit()
elif self.gen_max == 1: # terminate here if constructing just one generation self.fx_data_tree_write(self.population_a, 'a') # save this single population to disk print ('\n We have constructed a single, stochastic population of', self.tree_pop_max,'Trees, and saved to disk') sys.exit()
else: print ('\n We have constructed the first, stochastic population of', self.tree_pop_max,'Trees')
PART 3 - evaluate first generation of Trees
print ('\n Evaluate the first generation of Trees ...') self.fx_fitness_gym(self.population_a) # generate expression, evaluate fitness, compare fitness self.fx_data_tree_write(self.population_a, 'a') # save the first generation of Trees to disk
PART 4 - evolve multiple generations of Trees
menu = 1 while menu != 0: # this allows the user to add generations mid-run and not get buried in nested iterations for self.gen_id in range(self.gen_id + 1, self.gen_max + 1): # evolve additional generations of Trees
print ('\n Evolve a population of Trees for Generation', self.gen_id, '...') self.population_b = ['Karoo GP by Kai Staats - Evolving Generation'] # initialise population_b to host the next generation self.fx_fitness_gene_pool() # generate the viable gene pool (compares against gp.tree_depth_min) self.fx_nextgen_reproduce() # method 1 - Reproduction self.fx_nextgen_point_mutate() # method 2 - Point Mutation self.fx_nextgen_branch_mutate() # method 3 - Branch Mutation self.fx_nextgen_crossover() # method 4 - Crossover self.fx_eval_generation() # evaluate all Trees in a single generation self.population_a = self.fx_evolve_pop_copy(self.population_b, ['Karoo GP by Kai Staats - Generation ' + str(self.gen_id)])
if mode == 's': menu = 0 # (s)erver mode - termination with completiont of prescribed run else: # (d)esktop mode - user is given an option to quit, review, and/or modify parameters; 'add' generations continues the run print ('\n\t\033[32m Enter \033[1m?\033[0;0m\033[32m to review your options or \033[1mq\033[0;0m\033[32muit\033[0;0m') menu = self.fx_karoo_pause()
self.fx_karoo_terminate() # archive populations and return to karoo_gp.py for a clean exit
return
def fx_karoo_pause_refer(self):
''' Enables (g)eneration, (i)nteractive, and (d)e(b)ug display modes to offer the (pause) menu at each prompt.
See fx_karoo_pause() for an explanation of the value being passed.
Called by: the functions called by PART 4 of fx_karoo_gp()
Arguments required: none '''
menu = 1 while menu == 1: menu = self.fx_karoo_pause()
return
def fx_karoo_pause(self):
''' Pause the program execution and engage the user, providing a number of options.
Called by: fx_karoo_pause_refer
Arguments required: [0,1,2] where (0) refers to an end-of-run; (1) refers to any use of the (pause) menu from within the run, and anticipates ENTER as an escape from the menu to continue the run; and (2) refers to an 'ERROR!' for which the user may want to archive data before terminating. At this point in time, (2) is associated with each error but does not provide any special options). '''
PART 1 - reset and pack values to send to menu.pause
menu_dict = {'input_a':'', 'input_b':0, 'display':self.display, 'tree_depth_max':self.tree_depth_max, 'tree_depth_min':self.tree_depth_min, 'tree_pop_max':self.tree_pop_max, 'gen_id':self.gen_id, 'gen_max':self.gen_max, 'tourn_size':self.tourn_size, 'evolve_repro':self.evolve_repro, 'evolve_point':self.evolve_point, 'evolve_branch':self.evolve_branch, 'evolve_cross':self.evolve_cross, 'fittest_dict':self.fittest_dict, 'pop_a_len':len(self.population_a), 'pop_b_len':len(self.population_b), 'path':self.path}
menu_dict = menu.pause(menu_dict) # call the external function menu.pause
PART 2 - unpack values returned from menu.pause
input_a = menu_dict['input_a'] input_b = menu_dict['input_b'] self.display = menu_dict['display'] self.tree_depth_min = menu_dict['tree_depth_min'] self.gen_max = menu_dict['gen_max'] self.tourn_size = menu_dict['tourn_size'] self.evolve_repro = menu_dict['evolve_repro'] self.evolve_point = menu_dict['evolve_point'] self.evolve_branch = menu_dict['evolve_branch'] self.evolve_cross = menu_dict['evolve_cross']
PART 3 - execute the user queries returned from menu.pause
if input_a == 'esc': return 2 # breaks out of the fx_karoo_gp() or fx_karoo_pause_refer() loop
elif input_a == 'eval': # evaluate a Tree against the TEST data self.fx_eval_poly(self.population_b[input_b]) # generate the raw and sympified expression for the given Tree using SymPy
print ('\n\t\033[36mTree', input_b, 'yields (raw):', self.algo_raw, '\033[0;0m') # print the raw expression
print ('\n\t\033[36mTree', input_b, 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m') # print the sympified expression
result = self.fx_fitness_eval(str(self.algo_sym), self.data_test, get_pred_labels = True) # might change to algo_raw evaluation if self.kernel == 'c': self.fx_fitness_test_classify(result) # TF tested 2017 02/02 elif self.kernel == 'r': self.fx_fitness_test_regress(result) elif self.kernel == 'm': self.fx_fitness_test_match(result)
elif self.kernel == '[other]': # use others as a template
elif input_a == 'print_a': # print a Tree from population_a self.fx_display_tree(self.population_a[input_b])
elif input_a == 'print_b': # print a Tree from population_b self.fx_display_tree(self.population_b[input_b])
elif input_a == 'pop_a': # list all Trees in population_a print ('') for tree_id in range(1, len(self.population_a)): self.fx_eval_poly(self.population_a[tree_id]) # extract the expression print ('\t\033[36m Tree', self.population_a[tree_id][0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m')
elif input_a == 'pop_b': # list all Trees in population_b print ('') for tree_id in range(1, len(self.population_b)): self.fx_eval_poly(self.population_b[tree_id]) # extract the expression print ('\t\033[36m Tree', self.population_b[tree_id][0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m')
elif input_a == 'load': # load population_s to replace population_a self.fx_data_recover(self.filename['s']) # NEED TO replace 's' with a user defined filename
elif input_a == 'write': # write the evolving population_b to disk self.fx_data_tree_write(self.population_b, 'b') print ('\n\t All current members of the evolving population_b saved to karoo_gp/runs/[date-time]/population_b.csv')
elif input_a == 'add': # check for added generations, then exit fx_karoo_pause and continue the run self.gen_max = self.gen_max + input_b # if input_b > 0: self.gen_max = self.gen_max + input_b - REMOVED 2019 06/05
elif input_a == 'quit': self.fx_karoo_terminate() # archive populations and exit
return 1
def fx_karoo_terminate(self): ''' Terminates the evolutionary run (if yet in progress), saves parameters and data to disk, and cleanly returns the user to karoo_gp.py and the command line.
Called by: fx_karoo_gp() and fx_karoo_pause_refer()
Arguments required: none '''
self.fx_data_params_write() target = open(self.filename['f'], 'w'); target.close() # initialize the .csv file for the final population self.fx_data_tree_write(self.population_b, 'f') # save the final generation of Trees to disk print ('\n\t\033[32m Your Trees and runtime parameters are archived in karoo_gp/runs/[date-time]/\033[0;0m')
print ('\n\033[3m "It is not the strongest of the species that survive, nor the most intelligent,\033[0;0m') print ('\033[3m but the one most responsive to change."\033[0;0m --Charles Darwin\n') print ('\033[3m Congrats!\033[0;0m Your Karoo GP run is complete.\n') sys.exit()
return
+++++++++++++++++++++++++++++++++++++++++++++
Methods to Load and Archive Data |
+++++++++++++++++++++++++++++++++++++++++++++
def fx_data_load(self, filename):
''' The data and function .csv files are loaded according to the fitness function kernel selected by the user. An alternative dataset may be loaded at launch, by appending a command line argument. The data is then split into both TRAINING and TEST segments in order to validate the success of the GP training run. Datasets less than 10 rows will not be split, rather copied in full to both TRAINING and TEST as it is assumed you are conducting a system validation run, as with the built-in MATCH kernel and associated dataset.
Called by: fx_karoo_gp
Arguments required: filename (of the dataset) '''
PART 1 - load the associated data set, operators, operands, fitness type, and coefficients
full_path = os.path.realpath(file); cwd = os.path.dirname(full_path) # for user Marco Cavaglia
cwd = os.getcwd()
data_dict = {'c':cwd + '/files/data_CLASSIFY.csv', 'r':cwd + '/files/data_REGRESS.csv', 'm':cwd + '/files/data_MATCH.csv', 'p':cwd + '/files/data_PLAY.csv'}
if len(sys.argv) == 1: # load data from the default karoo_gp/files/ directory data_x = np.loadtxt(data_dict[self.kernel], skiprows = 1, delimiter = ',', dtype = float); data_x = data_x[:,0:-1] # load all but the right-most column data_y = np.loadtxt(data_dict[self.kernel], skiprows = 1, usecols = (-1,), delimiter = ',', dtype = float) # load only right-most column (class labels) header = open(data_dict[self.kernel],'r') # open file to be read (below) self.dataset = data_dict[self.kernel] # copy the name only
elif len(sys.argv) == 2: # load an external data file data_x = np.loadtxt(sys.argv[1], skiprows = 1, delimiter = ',', dtype = float); data_x = data_x[:,0:-1] # load all but the right-most column data_y = np.loadtxt(sys.argv[1], skiprows = 1, usecols = (-1,), delimiter = ',', dtype = float) # load only right-most column (class labels) header = open(sys.argv[1],'r') # open file to be read (below) self.dataset = sys.argv[1] # copy the name only
elif len(sys.argv) > 2: # receive filename and additional arguments from karoo_gp.py via argparse data_x = np.loadtxt(filename, skiprows = 1, delimiter = ',', dtype = float); data_x = data_x[:,0:-1] # load all but the right-most column data_y = np.loadtxt(filename, skiprows = 1, usecols = (-1,), delimiter = ',', dtype = float) # load only right-most column (class labels) header = open(filename,'r') # open file to be read (below) self.dataset = filename # copy the name only
fitt_dict = {'c':'max', 'r':'min', 'm':'max', 'p':''} self.fitness_type = fitt_dict[self.kernel] # load fitness type
func_dict = {'c':cwd + '/files/operators_CLASSIFY.csv', 'r':cwd + '/files/operators_REGRESS.csv', 'm':cwd + '/files/operators_MATCH.csv', 'p':cwd + '/files/operators_PLAY.csv'} self.functions = np.loadtxt(func_dict[self.kernel], delimiter=',', skiprows=1, dtype = str) # load the user defined functions (operators) self.terminals = header.readline().split(','); self.terminals[-1] = self.terminals[-1].replace('\n','') # load the user defined terminals (operands) self.class_labels = len(np.unique(data_y)) # load the user defined true labels for classification or solutions for regression
self.coeff = np.loadtxt(cwd + '/files/coefficients.csv', delimiter=',', skiprows=1, dtype = str) # load the user defined coefficients - NOT USED YET
PART 2 - from the dataset, extract TRAINING and TEST data
if len(data_x) < 11: # for small datasets we will not split them into TRAINING and TEST components datatrain = np.c[data_x, data_y] datatest = np.c[data_x, data_y]
else: # if larger than 10, we run the data through the SciKit Learn's 'random split' function x_train, x_test, y_train, y_test = skcv.train_test_split(data_x, data_y, test_size = 0.2) # 80/20 TRAIN/TEST split data_x, data_y = [], [] # clear from memory
datatrain = np.c[x_train, y_train] # recombine each row of data with its associated class label (right column) x_train, y_train = [], [] # clear from memory
datatest = np.c[x_test, y_test] # recombine each row of data with its associated class label (right column) x_test, y_test = [], [] # clear from memory
self.data_train_cols = len(data_train[0,:]) # qty count self.data_train_rows = len(data_train[:,0]) # qty count self.data_test_cols = len(data_test[0,:]) # qty count self.data_test_rows = len(data_test[:,0]) # qty count
PART 3 - load TRAINING and TEST data for TensorFlow processing - tested 2017 02/02
self.data_train = data_train # Store train data for processing in TF self.data_test = data_test # Store test data for processing in TF self.tf_device = "/gpu:0" # Set TF computation backend device (CPU or GPU); gpu:n = 1st, 2nd, or ... GPU device self.tf_device_log = False # TF device usage logging (for debugging)
PART 4 - create a unique directory and initialise all .csv files
self.datetime = datetime.now().strftime('%Y-%m-%d%H-%M-%S') self.path = os.path.join(cwd, 'runs/', filename.split('.')[0] + '' + self.datetime + '/') # generate a unique directory name if not os.path.isdir(self.path): os.makedirs(self.path) # make a unique directory
self.filename = {} # a dictionary to hold .csv filenames
self.filename.update( {'a':self.path + 'population_a.csv'} ) target = open(self.filename['a'], 'w'); target.close() # initialise a .csv file for population 'a' (foundation)
self.filename.update( {'b':self.path + 'population_b.csv'} ) target = open(self.filename['b'], 'w'); target.close() # initialise a .csv file for population 'b' (evolving)
self.filename.update( {'f':self.path + 'population_f.csv'} ) target = open(self.filename['f'], 'w'); target.close() # initialise a .csv file for the final population (test)
self.filename.update( {'s':self.path + 'population_s.csv'} ) target = open(self.filename['s'], 'w'); target.close() # initialise a .csv file to manually load (seed)
return
def fx_data_recover(self, population):
''' This method is used to load a saved population of Trees, as invoked through the (pause) menu where population_r replaces population_a in the karoo_gp/runs/[date-time]/ directory.
Called by: fx_karoo_pause
Arguments required: population (filename['s']) '''
with open(population, 'rb') as csv_file: target = csv.reader(csv_file, delimiter=',') n = 0 # track row count
for row in target: print ('row', row)
n = n + 1 if n == 1: pass # skip first empty row
elif n == 2: self.population_a = [row] # write header to population_a
else: if row == []: self.tree = np.array([[]]) # initialise Tree array
else: if self.tree.shape[1] == 0: self.tree = np.append(self.tree, [row], axis = 1) # append first row to Tree
else: self.tree = np.append(self.tree, [row], axis = 0) # append subsequent rows to Tree
if self.tree.shape[0] == 13: self.population_a.append(self.tree) # append complete Tree to population list
print ('\n', self.population_a)
return
def fx_data_tree_clean(self, tree):
''' This method aesthetically cleans the Tree array, removing redundant data.
Called by: fx_data_tree_append, fx_evolve_branch_copy
Arguments required: tree '''
tree[0][2:] = '' # A little clean-up to make things look pretty :) tree[1][2:] = '' # Ignore the man behind the curtain! tree[2][2:] = '' # Yes, I am a bit OCD ... but you know you appreciate clean arrays.
return tree
def fx_data_tree_append(self, tree):
''' Append Tree array to the foundation Population.
Called by: fx_init_construct
Arguments required: tree '''
self.fx_data_tree_clean(tree) # clean 'tree' prior to storing self.population_a.append(tree) # append 'tree' to population list
return
def fx_data_tree_write(self, population, key):
''' Save population_* to disk.
Called by: fx_karoo_gp, fx_eval_generation
Arguments required: population, key '''
with open(self.filename[key], 'a') as csv_file: target = csv.writer(csv_file, delimiter=',') if self.gen_id != 1: target.writerows(['']) # empty row before each generation target.writerows([['Karoo GP by Kai Staats', 'Generation:', str(self.gen_id)]])
for tree in range(1, len(population)): target.writerows(['']) # empty row before each Tree for row in range(0, 13): # increment through each row in the array Tree target.writerows([population[tree][row]])
return
def fx_data_params_write(self): # tested 2017 02/13; argument 'app' removed to simplify termination 2019 06/08
''' Save run-time configuration parameters to disk.
Called by: fx_karoo_gp, fx_karoo_pause
Arguments required: app '''
file = open(self.path + 'log_config.txt', 'w') file.write('Karoo GP') file.write('\n launched: ' + str(self.datetime)) file.write('\n dataset: ' + str(self.dataset)) file.write('\n') file.write('\n kernel: ' + str(self.kernel)) file.write('\n precision: ' + str(self.precision)) file.write('\n')
file.write('tree type: ' + tree_type)
file.write('tree depth base: ' + str(tree_depth_base))
file.write('\n tree depth max: ' + str(self.tree_depth_max)) file.write('\n min node count: ' + str(self.tree_depth_min)) file.write('\n') file.write('\n genetic operator Reproduction: ' + str(self.evolve_repro)) file.write('\n genetic operator Point Mutation: ' + str(self.evolve_point)) file.write('\n genetic operator Branch Mutation: ' + str(self.evolve_branch)) file.write('\n genetic operator Crossover: ' + str(self.evolve_cross)) file.write('\n') file.write('\n tournament size: ' + str(self.tourn_size)) file.write('\n population: ' + str(self.tree_pop_max)) file.write('\n number of generations: ' + str(self.gen_id)) file.write('\n\n') file.close()
file = open(self.path + 'log_test.txt', 'w') file.write('Karoo GP') file.write('\n launched: ' + str(self.datetime)) file.write('\n dataset: ' + str(self.dataset)) file.write('\n')
if len(self.fittest_dict) > 0:
fitness_best = 0 fittest_tree = 0
revised method, re-evaluating all Trees from stored fitness score
for tree_id in range(1, len(self.population_b)):
fitness = float(self.population_b[tree_id][12][1])
if self.kernel == 'c': # display best fit Trees for the CLASSIFY kernel if fitness >= fitness_best: # find the Tree with Maximum fitness score fitness_best = fitness; fittest_tree = tree_id # set best fitness Tree
elif self.kernel == 'r': # display best fit Trees for the REGRESSION kernel if fitness_best == 0: fitness_best = fitness # set the baseline first time through if fitness <= fitness_best: # find the Tree with Minimum fitness score fitness_best = fitness; fittest_tree = tree_id # set best fitness Tree
elif self.kernel == 'm': # display best fit Trees for the MATCH kernel if fitness == self.data_train_rows: # find the Tree with a perfect match for all data rows fitness_best = fitness; fittest_tree = tree_id # set best fitness Tree
elif self.kernel == '[other]': # use others as a template
print ('fitness_best:', fitness_best, 'fittest_tree:', fittest_tree)
test the most fit Tree and write to the .txt log
self.fx_eval_poly(self.population_b[int(fittest_tree)]) # generate the raw and sympified expression for the given Tree using SymPy expr = str(self.algo_sym) # get simplified expression and process it by TF - tested 2017 02/02 result = self.fx_fitness_eval(expr, self.data_test, get_pred_labels = True)
file.write('\n\n Tree ' + str(fittest_tree) + ' is the most fit, with expression:') file.write('\n\n ' + str(self.algo_sym))
if self.kernel == 'c': file.write('\n\n Classification fitness score: {}'.format(result['fitness'])) file.write('\n\n Precision-Recall report:\n {}'.format(skm.classification_report(result['solution'], result['pred_labels'][0]))) file.write('\n Confusion matrix:\n {}'.format(skm.confusion_matrix(result['solution'], result['pred_labels'][0])))
elif self.kernel == 'r': MSE, fitness = skm.mean_squared_error(result['result'], result['solution']), result['fitness'] file.write('\n\n Regression fitness score: {}'.format(fitness)) file.write('\n Mean Squared Error: {}'.format(MSE))
elif self.kernel == 'm': file.write('\n\n Matching fitness score: {}'.format(result['fitness']))
elif self.kernel == '[other]': # use others as a template
else: file.write('\n\n There were no evolved solutions generated in this run... your species has gone extinct!')
file.write('\n\n') file.close()
return
+++++++++++++++++++++++++++++++++++++++++++++
Methods to Construct the 1st Generation |
+++++++++++++++++++++++++++++++++++++++++++++
def fx_init_construct(self, tree_type, tree_depth_base):
''' This method constructs the initial population of Tree type 'tree_type' and of the size tree_depth_base. The Tree can be Full, Grow, or "Ramped Half/Half" as defined by John Koza.
Called by: fx_karoo_gp
Arguments required: tree_type, tree_depth_base '''
if self.display == 'i': print ('\n\t\033[32m Press \033[36m\033[1m?\033[0;0m\033[32m at any \033[36m\033[1m(pause)\033[0;0m\033[32m, or \033[36m\033[1mENTER\033[0;0m \033[32mto continue the run\033[0;0m'); self.fx_karoo_pause_refer()
if tree_type == 'r': # Ramped 50/50
TREE_ID = 1 for n in range(1, int((self.tree_pop_max / 2) / tree_depth_base) + 1): # split the population into equal parts for depth in range(1, tree_depth_base + 1): # build 2 Trees at each depth self.fx_init_tree_build(TREE_ID, 'f', depth) # build a Full Tree self.fx_data_tree_append(self.tree) # append Tree to the list 'gp.population_a' TREE_ID = TREE_ID + 1
self.fx_init_tree_build(TREE_ID, 'g', depth) # build a Grow Tree self.fx_data_tree_append(self.tree) # append Tree to the list 'gp.population_a' TREE_ID = TREE_ID + 1
if TREE_ID < self.tree_pop_max: # eg: split 100 by 2*3 and it will produce only 96 Trees ... for n in range(self.tree_pop_max - TREE_ID + 1): # ... so we complete the run self.fx_init_tree_build(TREE_ID, 'g', tree_depth_base) self.fx_data_tree_append(self.tree) TREE_ID = TREE_ID + 1
else: pass
else: # Full or Grow for TREE_ID in range(1, self.tree_pop_max + 1): self.fx_init_tree_build(TREE_ID, tree_type, tree_depth_base) # build the 1st generation of Trees self.fx_data_tree_append(self.tree)
return
def fx_init_tree_build(self, TREE_ID, tree_type, tree_depth_base):
''' This method combines 4 sub-methods into a single method for ease of deployment. It is designed to executed within a loop such that an entire population is built. However, it may also be run from the command line, passing a single TREE_ID to the method.
'tree_type' is either (f)ull or (g)row. Note, however, that when the user selects 'ramped 50/50' at launch, it is still (f) or (g) which are passed to this method.
Called by: fx_init_construct, fx_evolve_crossover, fx_evolve_grow_mutate
Arguments required: TREE_ID, tree_type, tree_depth_base '''
self.fx_init_tree_initialise(TREE_ID, tree_type, tree_depth_base) # initialise a new Tree self.fx_init_root_build() # build the Root node self.fx_init_function_build() # build the Function nodes self.fx_init_terminal_build() # build the Terminal nodes
return # each Tree is written to 'gp.tree'
def fx_init_tree_initialise(self, TREE_ID, tree_type, tree_depth_base):
''' Assign 13 global variables to the array 'tree'.
Build the array 'tree' with 13 rows and initally, just 1 column of labels. This array will grow horizontally as each new node is appended. The values of this array are stored as string characters, numbers forced to integers at the point of execution.
Use of the debug (db) interface mode enables the user to watch the genetic operations as they work on the Trees.
Called by: fx_init_tree_build
Arguments required: TREE_ID, tree_type, tree_depth_base '''
self.pop_TREE_ID = TREE_ID # pos 0: a unique identifier for each tree self.pop_tree_type = tree_type # pos 1: a global constant based upon the initial user setting self.pop_tree_depth_base = tree_depth_base # pos 2: a global variable which conveys 'tree_depth_base' as unique to each new Tree self.pop_NODE_ID = 1 # pos 3: unique identifier for each node; this is the INDEX KEY to this array self.pop_node_depth = 0 # pos 4: depth of each node when committed to the array self.pop_node_type = '' # pos 5: root, function, or terminal self.pop_node_label = '' # pos 6: operator [+, -, *, ...] or terminal [a, b, c, ...] self.pop_node_parent = '' # pos 7: parent node self.pop_node_arity = '' # pos 8: number of nodes attached to each non-terminal node self.pop_node_c1 = '' # pos 9: child node 1 self.pop_node_c2 = '' # pos 10: child node 2 self.pop_node_c3 = '' # pos 11: child node 3 (assumed max of 3 with boolean operator 'if') self.pop_fitness = '' # pos 12: fitness score following Tree evaluation
self.tree = np.array([ ['TREE_ID'],['tree_type'],['tree_depth_base'],['NODE_ID'],['node_depth'],['node_type'],['node_label'],['node_parent'],['node_arity'],['node_c1'],['node_c2'],['node_c3'],['fitness'] ])
return
Root Node
def fx_init_root_build(self):
''' Build the Root node for the initial population.
Called by: fx_init_tree_build
Arguments required: none '''
self.fx_init_function_select() # select the operator for root
if self.pop_node_arity == 1: # 1 child self.pop_node_c1 = 2 self.pop_node_c2 = '' self.pop_node_c3 = ''
elif self.pop_node_arity == 2: # 2 children self.pop_node_c1 = 2 self.pop_node_c2 = 3 self.pop_node_c3 = ''
elif self.pop_node_arity == 3: # 3 children self.pop_node_c1 = 2 self.pop_node_c2 = 3 self.pop_node_c3 = 4
else: print ('\n\t\033[31m ERROR! In fx_init_root_build: pop_node_arity =', self.pop_node_arity, '\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08
self.pop_node_type = 'root'
self.fx_init_node_commit()
return
Function Nodes
def fx_init_function_build(self):
''' Build the Function nodes for the intial population.
Called by: fx_init_tree_build
Arguments required: none '''
for i in range(1, self.pop_tree_depth_base): # increment depth, from 1 through 'tree_depth_base' - 1
self.pop_node_depth = i # increment 'node_depth'
parent_arity_sum = 0 prior_sibling_arity = 0 # reset for 'c_buffer' in 'children_link' prior_siblings = 0 # reset for 'c_buffer' in 'children_link'
for j in range(1, len(self.tree[3])): # increment through all nodes (exclude 0) in array 'tree'
if int(self.tree[4][j]) == self.pop_node_depth - 1: # find parent nodes which reside at the prior depth parent_arity_sum = parent_arity_sum + int(self.tree[8][j]) # sum arities of all parent nodes at the prior depth
(do not merge these 2 "j" loops or it gets all kinds of messed up)
for j in range(1, len(self.tree[3])): # increment through all nodes (exclude 0) in array 'tree'
if int(self.tree[4][j]) == self.pop_node_depth - 1: # find parent nodes which reside at the prior depth
for k in range(1, int(self.tree[8][j]) + 1): # increment through each degree of arity for each parent node self.pop_node_parent = int(self.tree[3][j]) # set the parent 'NODE_ID' ... prior_sibling_arity = self.fx_init_function_gen(parent_arity_sum, prior_sibling_arity, prior_siblings) # ... generate a Function ndoe prior_siblings = prior_siblings + 1 # sum sibling nodes (current depth) who will spawn their own children (cousins? :)
return
def fx_init_function_gen(self, parent_arity_sum, prior_sibling_arity, prior_siblings):
''' Generate a single Function node for the initial population.
Called by fx_init_function_build
Arguments required: parent_arity_sum, prior_sibling_arity, prior_siblings '''
if self.pop_tree_type == 'f': # user defined as (f)ull self.fx_init_function_select() # retrieve a function self.fx_init_child_link(parent_arity_sum, prior_sibling_arity, prior_siblings) # establish links to children
elif self.pop_tree_type == 'g': # user defined as (g)row rnd = np.random.randint(2)
if rnd == 0: # randomly selected as Function self.fx_init_function_select() # retrieve a function self.fx_init_child_link(parent_arity_sum, prior_sibling_arity, prior_siblings) # establish links to children
elif rnd == 1: # randomly selected as Terminal self.fx_init_terminal_select() # retrieve a terminal self.pop_node_c1 = '' self.pop_node_c2 = '' self.pop_node_c3 = ''
self.fx_init_node_commit() # commit new node to array prior_sibling_arity = prior_sibling_arity + self.pop_node_arity # sum the arity of prior siblings
return prior_sibling_arity
def fx_init_function_select(self):
''' Define a single Function (operator extracted from the associated functions.csv) for the initial population.
Called by: fx_init_function_gen, fx_init_root_build
Arguments required: none '''
self.pop_node_type = 'func' rnd = np.random.randint(0, len(self.functions[:,0])) # call the previously loaded .csv which contains all operators self.pop_node_label = self.functions[rnd][0] self.pop_node_arity = int(self.functions[rnd][1])
return
Terminal Nodes
def fx_init_terminal_build(self):
''' Build the Terminal nodes for the intial population.
Called by: fx_init_tree_build
Arguments required: none '''
self.pop_node_depth = self.pop_tree_depth_base # set the final node_depth (same as 'gp.pop_node_depth' + 1)
for j in range(1, len(self.tree[3]) ): # increment through all nodes (exclude 0) in array 'tree'
if int(self.tree[4][j]) == self.pop_node_depth - 1: # find parent nodes which reside at the prior depth
for k in range(1,(int(self.tree[8][j]) + 1)): # increment through each degree of arity for each parent node self.pop_node_parent = int(self.tree[3][j]) # set the parent 'NODE_ID' ... self.fx_init_terminal_gen() # ... generate a Terminal node
return
def fx_init_terminal_gen(self):
''' Generate a single Terminal node for the initial population.
Called by: fx_init_terminal_build
Arguments required: none '''
self.fx_init_terminal_select() # retrieve a terminal self.pop_node_c1 = '' self.pop_node_c2 = '' self.pop_node_c3 = ''
self.fx_init_node_commit() # commit new node to array
return
def fx_init_terminal_select(self):
''' Define a single Terminal (variable extracted from the top row of the associated TRAINING data)
Called by: fx_init_terminal_gen, fx_init_function_gen
Arguments required: none '''
self.pop_node_type = 'term' rnd = np.random.randint(0, len(self.terminals) - 1) # call the previously loaded .csv which contains all terminals self.pop_node_label = self.terminals[rnd] self.pop_node_arity = 0
return
The Lovely Children
def fx_init_child_link(self, parent_arity_sum, prior_sibling_arity, prior_siblings):
''' Link each parent node to its children in the intial population.
Called by: fx_init_function_gen
Arguments required: parent_arity_sum, prior_sibling_arity, prior_siblings '''
c_buffer = 0
for n in range(1, len(self.tree[3]) ): # increment through all nodes (exclude 0) in array 'tree'
if int(self.tree[4][n]) == self.pop_node_depth - 1: # find all nodes that reside at the prior (parent) 'node_depth'
c_buffer = self.pop_NODE_ID + (parent_arity_sum + prior_sibling_arity - prior_siblings) # One algo to rule the world!
if self.pop_node_arity == 0: # terminal in a Grow Tree self.pop_node_c1 = '' self.pop_node_c2 = '' self.pop_node_c3 = ''
elif self.pop_node_arity == 1: # 1 child self.pop_node_c1 = c_buffer self.pop_node_c2 = '' self.pop_node_c3 = ''
elif self.pop_node_arity == 2: # 2 children self.pop_node_c1 = c_buffer self.pop_node_c2 = c_buffer + 1 self.pop_node_c3 = ''
elif self.pop_node_arity == 3: # 3 children self.pop_node_c1 = c_buffer self.pop_node_c2 = c_buffer + 1 self.pop_node_c3 = c_buffer + 2
else: print ('\n\t\033[31m ERROR! In fx_init_child_link: pop_node_arity =', self.pop_node_arity, '\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08
return
def fx_init_node_commit(self):
''' Commit the values of a new node (root, function, or terminal) to the array 'tree'.
Called by: fx_init_root_build, fx_init_function_gen, fx_init_terminal_gen
Arguments required: none '''
self.tree = np.append(self.tree, [ [self.pop_TREE_ID],[self.pop_tree_type],[self.pop_tree_depth_base],[self.pop_NODE_ID],[self.pop_node_depth],[self.pop_node_type],[self.pop_node_label],[self.pop_node_parent],[self.pop_node_arity],[self.pop_node_c1],[self.pop_node_c2],[self.pop_node_c3],[self.pop_fitness] ], 1)
self.pop_NODE_ID = self.pop_NODE_ID + 1
return
+++++++++++++++++++++++++++++++++++++++++++++
Methods to Evaluate a Tree |
+++++++++++++++++++++++++++++++++++++++++++++
def fx_eval_poly(self, tree):
''' Evaluate a Tree and generate its multivariate expression (both raw and Sympified).
We need to extract the variables from the expression. However, these variables are no longer correlated to the original variables listed across the top of each column of data.csv. Therefore, we must re-assign the respective values for each subsequent row in the data .csv, for each Tree's unique expression.
Called by: fx_karoo_pause, fx_data_params_write, fx_eval_label, fx_fitness_gym, fx_fitness_gene_pool, fx_display_tree
Arguments required: tree '''
self.algo_raw = self.fx_eval_label(tree, 1) # pass the root 'node_id', then flatten the Tree to a string self.algo_sym = sympify(self.algo_raw) # convert string to a functional expression (the coolest line in Karoo! :)
return
def fx_eval_label(self, tree, node_id):
''' Evaluate all or part of a Tree (starting at node_id) and return a raw mutivariate expression ('algo_raw').
This method is called once per Tree, but may be called at any time to prepare an expression for any full or partial (branch) Tree contained in 'population'. Pass the starting node for recursion via the local variable 'node_id' where the local variable 'tree' is a copy of the Tree you desire to evaluate.
Called by: fx_eval_poly, fx_eval_label (recursively)
Arguments required: tree, node_id '''
if tree[6, node_id] == 'not': tree[6, node_id] = ', not' # temp until this can be fixed at data_load
node_id = int(node_id)
if tree[8, node_id] == '0': # arity of 0 for the pattern '[term]' return '(' + tree[6, node_id] + ')' # 'node_label' (function or terminal)
else: if tree[8, node_id] == '1': # arity of 1 for the explicit pattern 'not [term]' return self.fx_eval_label(tree, tree[9, node_id]) + tree[6, node_id]
elif tree[8, node_id] == '2': # arity of 2 for the pattern '[func] [term] [func]' return self.fx_eval_label(tree, tree[9, node_id]) + tree[6, node_id] + self.fx_eval_label(tree, tree[10, node_id])
elif tree[8, node_id] == '3': # arity of 3 for the explicit pattern 'if [term] then [term] else [term]' return tree[6, node_id] + self.fx_eval_label(tree, tree[9, node_id]) + ' then ' + self.fx_eval_label(tree, tree[10, node_id]) + ' else ' + self.fx_eval_label(tree, tree[11, node_id])
def fx_eval_id(self, tree, node_id):
''' Evaluate all or part of a Tree and return a list of all 'NODE_ID's.
This method generates a list of all 'NODE_ID's from the given Node and below. It is used primarily to generate 'branch' for the multi-generational mutation of Trees.
Pass the starting node for recursion via the local variable 'node_id' where the local variable 'tree' is a copy of the Tree you desire to evaluate.
Called by: fx_eval_id (recursively), fx_evolve_branch_select
Arguments required: tree, node_id
'''node_id = int(node_id)
if tree[8, node_id] == '0': # arity of 0 for the pattern '[NODE_ID]' return tree[3, node_id] # 'NODE_ID'
else: if tree[8, node_id] == '1': # arity of 1 for the pattern '[NODE_ID], [NODE_ID]' return tree[3, node_id] + ', ' + self.fx_eval_id(tree, tree[9, node_id])
elif tree[8, node_id] == '2': # arity of 2 for the pattern '[NODE_ID], [NODE_ID], [NODE_ID]' return tree[3, node_id] + ', ' + self.fx_eval_id(tree, tree[9, node_id]) + ', ' + self.fx_eval_id(tree, tree[10, node_id])
elif tree[8, node_id] == '3': # arity of 3 for the pattern '[NODE_ID], [NODE_ID], [NODE_ID], [NODE_ID]' return tree[3, node_id] + ', ' + self.fx_eval_id(tree, tree[9, node_id]) + ', ' + self.fx_eval_id(tree, tree[10, node_id]) + ', ' + self.fx_eval_id(tree, tree[11, node_id])
def fx_eval_generation(self):
''' This method invokes the evaluation of an entire generation of Trees. It automatically evaluates population_b before invoking the copy of _b to _a.
Called by: fx_karoo_gp
Arguments required: none '''
if self.display != 's': if self.display == 'i': print ('') print ('\n Evaluate all Trees in Generation', self.gen_id) if self.display == 'i': self.fx_karoo_pause_refer() # 2019 06/07
for tree_id in range(1, len(self.population_b)): # renumber all Trees in given population - merged fx_evolve_tree_renum 2018 04/12 self.population_b[tree_id][0][1] = tree_id
self.fx_fitness_gym(self.population_b) # run fx_eval(), fx_fitness(), fx_fitness_store(), and fitness record self.fx_data_tree_write(self.population_b, 'a') # archive current population as foundation for next generation
if self.display != 's': print ('\n Copy gp.population_b to gp.population_a\n')
return
+++++++++++++++++++++++++++++++++++++++++++++
Methods to Train and Test a Tree |
+++++++++++++++++++++++++++++++++++++++++++++
def fx_fitness_gym(self, population):
''' Part 1 evaluates each expression against the data, line for line. This is the most time consuming and computationally expensive part of genetic programming. When GPUs are available, the performance can increase by many orders of magnitude for datasets measured in millions of data.
Part 2 evaluates every Tree in each generation to determine which have the best, overall fitness score. This could be the highest or lowest depending upon if the fitness function is maximising (higher is better) or minimising (lower is better). The total fitness score is then saved with each Tree in the external .csv file.
Part 3 compares the fitness of each Tree to the prior best fit in order to track those that improve with each comparison. For matching functions, all the Trees will have the same fitness score, but they may present more than one solution. For minimisation and maximisation functions, the final Tree should present the best overall fitness for that generation. It is important to note that Part 3 does not in any way influence the Tournament Selection which is a stand-alone process.
Called by: fx_karoo_gp, fx_eval_generations
Arguments required: population '''
fitness_best = 0 self.fittest_dict = {} time_sum = 0
for tree_id in range(1, len(population)):
PART 1 - GENERATE MULTIVARIATE EXPRESSION FOR EACH TREE
self.fx_eval_poly(population[tree_id]) # extract the expression if self.display not in ('s'): print ('\t\033[36mTree', population[tree_id][0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m')
PART 2 - EVALUATE FITNESS FOR EACH TREE AGAINST TRAINING DATA
fitness = 0
expr = str(self.algo_sym) # get sympified expression and process it with TF - tested 2017 02/02 result = self.fx_fitness_eval(expr, self.data_train) fitness = result['fitness'] # extract fitness score
if self.display == 'i': print ('\t \033[36m with fitness sum:\033[1m', fitness, '\033[0;0m\n')
self.fx_fitness_store(population[tree_id], fitness) # store Fitness with each Tree
PART 3 - COMPARE FITNESS OF ALL TREES IN CURRENT GENERATION
if self.kernel == 'c': # display best fit Trees for the CLASSIFY kernel if fitness >= fitness_best: # find the Tree with Maximum fitness score fitness_best = fitness # set best fitness score self.fittest_dict.update({tree_id:self.algo_sym}) # add to dictionary if fitness >= prior
elif self.kernel == 'r': # display best fit Trees for the REGRESSION kernel if fitness_best == 0: fitness_best = fitness # set the baseline first time through if fitness <= fitness_best: # find the Tree with Minimum fitness score fitness_best = fitness # set best fitness score self.fittest_dict.update({tree_id:self.algo_sym}) # add to dictionary if fitness <= prior
elif self.kernel == 'm': # display best fit Trees for the MATCH kernel if fitness == self.data_train_rows: # find the Tree with a perfect match for all data rows fitness_best = fitness # set best fitness score self.fittest_dict.update({tree_id:self.algo_sym}) # add to dictionary if all rows match
elif self.kernel == '[other]': # use others as a template
print ('\n\033[36m ', len(list(self.fittest_dict.keys())), 'trees\033[1m', np.sort(list(self.fittest_dict.keys())), '\033[0;0m\033[36moffer the highest fitness scores.\033[0;0m') if self.display == 'g': self.fx_karoo_pause_refer() # 2019 06/07
return
def fx_fitness_eval(self, expr, data, get_pred_labels = False):
''' Computes tree expression using TensorFlow (TF) returning results and fitness scores.
This method orchestrates most of the TF routines by parsing input string 'expression' and converting it into a TF operation graph which is then processed in an isolated TF session to compute the results and corresponding fitness values.
'self.tf_device' - controls which device will be used for computations (CPU or GPU). 'self.tf_device_log' - controls device placement logging (debug only).
Args: 'expr' - a string containing math expression to be computed on the data. Variable names should match corresponding terminal names in 'self.terminals'.
'data' - an 'n by m' matrix of the data points containing n observations and m features per observation. Variable order should match corresponding order of terminals in 'self.terminals'.
'get_pred_labels' - a boolean flag which controls whether the predicted labels should be extracted from the evolved results. This applies only to the CLASSIFY kernel and defaults to 'False'.
Returns: A dict mapping keys to the following outputs: 'result' - an array of the results of applying given expression to the data 'pred_labels' - an array of the predicted labels extracted from the results; defined only for CLASSIFY kernel, else None 'solution' - an array of the solution values extracted from the data (variable 's' in the dataset) 'pairwise_fitness' - an array of the element-wise results of applying corresponding fitness kernel function 'fitness' - aggregated scalar fitness score
Called by: fx_karoo_pause, fx_data_params_write, fx_fitness_gym
Arguments required: expr, data '''
Initialize TensorFlow session
tf.compat.v1.reset_default_graph() # Reset TF internal state and cache (after previous processing) config = tf.compat.v1.ConfigProto(log_device_placement=self.tf_device_log, allow_soft_placement=True) config.gpu_options.allow_growth = True
with tf.compat.v1.Session(config=config) as sess: with sess.graph.device(self.tf_device):
1 - Load data into TF vectors
tensors = {} for i in range(len(self.terminals)): var = self.terminals[i] tensors[var] = tf.constant(data[:, i], dtype=tf.float32) # converts data into vectors
2- Transform string expression into TF operation graph
result = self.fx_fitness_expr_parse(expr, tensors) pred_labels = tf.no_op() # a placeholder, applies only to CLASSIFY kernel solution = tensors['s'] # solution value is assumed to be stored in 's' terminal
3- Add fitness computation into TF graph
if self.kernel == 'c': # CLASSIFY kernel
''' Creates element-wise fitness computation TensorFlow (TF) sub-graph for CLASSIFY kernel.
This method uses the 'sympified' (SymPy) expression ('algo_sym') created in fx_eval_poly() and the data set loaded at run-time to evaluate the fitness of the selected kernel.
This multiclass classifer compares each row of a given Tree to the known solution, comparing predicted labels generated by Karoo GP against the true classs labels. This method is able to work with any number of class labels, from 2 to n. The left-most bin includes -inf. The right-most bin includes +inf. Those inbetween are by default confined to the spacing of 1.0 each, as defined by:
(solution - 1) < result <= solution
The skew adjusts the boundaries of the bins such that they fall on both the negative and positive sides of the origin. At the time of this writing, an odd number of class labels will generate an extra bin on the positive side of origin as it has not yet been determined the effect of enabling the middle bin to include both a negative and positive result. '''
was breaking with upgrade from Tensorflow 1.1 to 1.3; fixed by Iurii by replacing [] with () as of 20171026
if get_pred_labels: pred_labels = tf.map_fn(self.fx_fitness_labels_map, result, dtype = [tf.int32, tf.string], swap_memory = True)
if get_pred_labels: pred_labels = tf.map_fn(self.fx_fitness_labels_map, result, dtype = (tf.int32, tf.string), swap_memory = True)
skew = (self.class_labels / 2) - 1
rule11 = tf.equal(solution, 0) rule12 = tf.less_equal(result, 0 - skew) rule13 = tf.logical_and(rule11, rule12)
rule21 = tf.equal(solution, self.class_labels - 1) rule22 = tf.greater(result, solution - 1 - skew) rule23 = tf.logical_and(rule21, rule22)
rule31 = tf.less(solution - 1 - skew, result) rule32 = tf.less_equal(result, solution - skew) rule33 = tf.logical_and(rule31, rule32)
pairwise_fitness = tf.cast(tf.logical_or(tf.logical_or(rule13, rule23), rule33), tf.int32)
elif self.kernel == 'r': # REGRESSION kernel
''' A very, very basic REGRESSION kernel which is not designed to perform well in the real world. It requires that you raise the minimum node count to keep it from converging on the value of '1'. Consider writing or integrating a more sophisticated kernel. '''
pairwise_fitness = tf.abs(solution - result)
elif self.kernel == 'm': # MATCH kernel
''' This is used for demonstration purposes only. '''
pairwise_fitness = tf.cast(tf.equal(solution, result), tf.int32) # breaks due to floating points
RTOL, ATOL = 1e-05, 1e-08 # fixes above issue by checking if a float value lies within a range of values pairwise_fitness = tf.cast(tf.less_equal(tf.abs(solution - result), ATOL + RTOL * tf.abs(result)), tf.int32)
elif self.kernel == '[other]': # use others as a template
else: raise Exception('Kernel type is wrong or missing. You entered {}'.format(self.kernel))
fitness = tf.reduce_sum(pairwise_fitness)
Process TF graph and collect the results
result, pred_labels, solution, fitness, pairwise_fitness = sess.run([result, pred_labels, solution, fitness, pairwise_fitness])
return {'result': result, 'pred_labels': pred_labels, 'solution': solution, 'fitness': float(fitness), 'pairwise_fitness': pairwise_fitness}
def fx_fitness_expr_parse(self, expr, tensors):
''' Extract expression tree from the string algo_sym and transform into TensorFlow (TF) graph.
Called by: fx_fitness_eval
Arguments required: expr, tensors '''
tree = ast.parse(expr, mode='eval').body
return self.fx_fitness_node_parse(tree, tensors)
def fx_fitness_chain_bool(self, values, operation, tensors):
''' Chains a sequence of boolean operations (e.g. 'a and b and c') into a single TensorFlow (TF) sub graph.
Called by: fx_fitness_node_parse
Arguments required: values, operation, tensors '''
x = tf.cast(self.fx_fitness_node_parse(values[0], tensors), tf.bool) if len(values) > 1: return operation(x, self.fx_fitness_chain_bool(values[1:], operation, tensors)) else: return x
def fx_fitness_chain_compare(self, comparators, ops, tensors):
''' Chains a sequence of comparison operations (e.g. 'a > b < c') into a single TensorFlow (TF) sub graph.
Called by: fx_fitness_node_parse
Arguments required: comparators, ops, tensors '''
x = self.fx_fitness_node_parse(comparators[0], tensors) y = self.fx_fitness_node_parse(comparators[1], tensors) if len(comparators) > 2: return tf.logical_and(operators[type(ops[0])](x, y), self.fx_fitness_chain_compare(comparators[1:], ops[1:], tensors)) else: return operators[type(ops[0])](x, y)
def fx_fitness_node_parse(self, node, tensors):
''' Recursively transforms parsed expression tree into TensorFlow (TF) graph.
Called by: fx_fitness_expr_parse, fx_fitness_chain_bool, fx_fitness_chain_compare
Arguments required: node, tensors '''
if isinstance(node, ast.Name): #
return tensors[node.id] elif isinstance(node, ast.Num): #
shape = tensors[tensors.keys()[0]].get_shape() # Python 2.7
shape = tensors[list(tensors.keys())[0]].get_shape() return tf.constant(node.n, shape=shape, dtype=tf.float32)
elif isinstance(node, ast.BinOp): #
, e.g., x + y return operators[type(node.op)](self.fx_fitness_node_parse(node.left, tensors), self.fx_fitness_node_parse(node.right, tensors)) elif isinstance(node, ast.UnaryOp): #
e.g., -1 return operators[type(node.op)](self.fx_fitness_node_parse(node.operand, tensors)) elif isinstance(node, ast.Call): #
( ) e.g., sin(x) return operators[node.func.id](*[self.fx_fitness_node_parse(arg, tensors) for arg in node.args]) elif isinstance(node, ast.BoolOp): #
e.g. x or y return self.fx_fitness_chain_bool(node.values, operators[type(node.op)], tensors) elif isinstance(node, ast.Compare): #
e.g., a > z return self.fx_fitness_chain_compare([node.left] + node.comparators, node.ops, tensors) else: raise TypeError(node)
def fx_fitness_labels_map(self, result):
''' For the CLASSIFY kernel, creat — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
On 10/13/19 4:47 PM, ajawfi wrote:> I sent you the file in the previous email. Did you receive it ?
Now I see that you included it in the body, not as an attachment. Yes, it is received. I will test the updates. Thank you.
Karoo GP Base Class
Define the methods and global variables used by Karoo GP
by Kai Staats, MSc with TensorFlow support provided by Iurii
Milovanov; see LICENSE.md
version 2.3 for Python 3.6
''' A NOTE TO THE NEWBIE, EXPERT, AND BRAVE Even if you are highly experienced in Genetic Programming, it is recommended that you review the 'Karoo User Guide' before running this application. While your computer will not burst into flames nor will the sun collapse into a black hole if you do not, you will likely find more enjoyment of this particular flavour of GP with a little understanding of its intent and design.
[snip]
After fixing math.log and math.log1p, I now get
Traceback (most recent call last):
File "karoo_gp.py", line 251, in
Something is not configured properly on your end. Please confirm OS, Python version, and installed libraries, then also which dataset you are employing. Please, do not spend time trying to fix any code as it will just work if all libraries are match.
On 5/22/20 7:28 PM, dmi-000 wrote:
After fixing math.log and math.log1p, I now get Traceback (most recent call last): File "karoo_gp.py", line 251, in
gp.fx_karoo_gp(kernel, tree_type, tree_depth_base, tree_depth_max, tree_depth_min, tree_pop_max, gen_max, tourn_size, filename, evolve_repro, evolve_point, evolve_branch, evolve_cross, display, precision, swim, mode) File "modules/karoo_gp_base_class.py", line 188, in fx_karoo_gp self.fx_fitness_gym(self.population_a) # generate expression, evaluate fitness, compare fitness File "modules/karoo_gp_base_class.py", line 1162, in fx_fitness_gym result = self.fx_fitness_eval(expr, self.data_train) File "modules/karoo_gp_base_class.py", line 1231, in fx_fitness_eval tf.reset_default_graph() # Reset TF internal state and cache (after previous processing) AttributeError: module 'tensorflow' has no attribute 'reset_default_graph'
I will be updating Karoo with the most recent libraries by the close of 2020. This will include the latest build of TF. I encourage you to download this new release, when complete. Stay tuned!
I get the following error when trying to run the package, it seems that the version of tensorflow used in the package is an older version:
File "modules\karoo_gp_base_class.py", line 59, in
'log': tf.log, # e.g., log(a)
AttributeError: module 'tensorflow' has no attribute 'log'