Open granawkins opened 2 years ago
Thank you for your technical contribution:
I have the following observations:
Calculating fitness (evaluating trees) in Karoo GP is the responsibility of tensorflow. This is not the performance problem I am facing, as it is done pretty quickly. My problem is crossover:
1.1
(Observation from original version of Karoo GP)
An arithmetic function like
a+(b**c)-sin(d)+ .... however large runs hundreds of time faster in the cross over section than:
IfLargerthan(ifsmallerthanequal(a,b),iflargerthan(e,f) ..... however large
I got logic to work easily in Karoo GP as described in the above equation, however, crossover takes very long to complete even with just a population of 100.
Since Tensorflow evaluates trees, I did not find a difference in speed between Cpu and Gpu regarding the two types of equations.
1.2
A huge notable improvement in crossover time was observed in the new, under development version of Karoo GP which is very promising, however, the generation of a population is so much slower. Regardless, total runtime of the new version is much smaller.
1.3
In the original version, I was easily able to modify the code to make it run under tensorflow 2. It was a little bit faster only.
1.4
For the logic part I had to use iflargerthan to avoid sympy evaluating the equations or part of then even though evaluate was set to FALSE. This is because in sympy:
(a and b and c = c)!!!!! And that is a problem referred to by sympy users.
To make sympy perform its job properly, I had to make sure it got fed strings it cannot identify.
1.5
I will share the parts of the code that made the logic work, as well as the ones that made sympy work. However, believe that, if the new version would generate a population as fast as the original version, Karoo would be ready to incorporate logic, which, in my experience is the most important part of GP.
Will get back to u in a bit with the code.
aymen
In Original Karoo version, the following mods make logical operators work properly:
**This avoids SYMPY evaluation:
'ifgreaterthan': tf.math.greater, 'ifgreaterthanequal': tf.math.greater_equal, 'iflessthan': tf.math.less, 'iflessthanequal': tf.math.less_equal,
**Labels used for logical operators:
if tree[6, node_id] == 'ifgreaterthan' or tree[6, node_id] == 'ifgreaterthanequal' or tree[6, node_id] == 'iflessthan' or tree[6, node_id] == 'iflessthanequal': return tree[6, node_id] + '(' + self.fx_eval_label(tree, tree[9, node_id]) + ',' + self.fx_eval_label(tree, tree[10, node_id]) + ')'
**Tensorflow 2 compatibility:
def fx_fitness_eval(self, expr, data, get_pred_labels=False):
# 1 - Load data into TF vectors
tensors = {}
for i in range(len(self.terminals)):
var = self.terminals[i]
# converts data into vectors
tensors[var] = tf.constant(data[:, i], dtype=tf.float32)
# 2- Transform string expression into TF operation graph #marker
result = tf.cast(self.fx_fitness_expr_parse(expr, tensors), dtype=tf.float32)
pred_labels = tf.no_op() # a placeholder, applies only to CLASSIFY kernel
solution = tensors['s'] # solution value is assumed to be stored in 's' terminal
@tf.function
def sessrun(result, pred_labels, solution): #sessrun is a custom function where the session used to be
# 3- Add fitness computation into TF graph
if self.kernel == 'c': # CLASSIFY kernel
if get_pred_labels: pred_labels = tf.map_fn(self.fx_fitness_labels_map, result, fn_output_signature = (tf.int32, tf.string), swap_memory = True)
skew = (self.class_labels / 2) - 1
rule11 = tf.equal(solution, 0)
rule12 = tf.less_equal(result, 0 - skew)
rule13 = tf.logical_and(rule11, rule12)
rule21 = tf.equal(solution, self.class_labels - 1)
rule22 = tf.greater(result, solution - 1 - skew)
rule23 = tf.logical_and(rule21, rule22)
rule31 = tf.less(solution - 1 - skew, result)
rule32 = tf.less_equal(result, solution - skew)
rule33 = tf.logical_and(rule31, rule32)
pairwise_fitness = tf.cast(tf.logical_or(tf.logical_or(rule13, rule23), rule33), tf.int32)
elif self.kernel == 'r': # REGRESSION kernel
pairwise_fitness = tf.squared_difference(solution, result)
elif self.kernel == 'm': # MATCH kernel
RTOL, ATOL = 1e-05, 1e-08
pairwise_fitness = tf.cast(tf.less_equal(tf.abs(solution - result), ATOL + RTOL * tf.abs(result)), tf.int32)
else:
raise Exception('Kernel type is wrong or missing. You entered {}'.format(self.kernel))
fitness = tf.reduce_sum(pairwise_fitness)
return result, pred_labels, solution, fitness, pairwise_fitness
# Process TF graph and collect the results: sessrun()
result, pred_labels, solution, fitness, pairwise_fitness = sessrun(result, pred_labels, solution)
return {'result': result, 'pred_labels': pred_labels, 'solution': solution, 'fitness': fitness, 'pairwise_fitness': pairwise_fitness}
** Logical operators results converted into float, other operators unaffected:
def fx_fitness_node_parse(self, node, tensors):
…
elif isinstance(node, ast.Call): #e.g. ifgreaterthan(a,b) —> 0 or 1 as float (same result if bool or float)
return tf.cast(operators[node.func.id](*[self.fx_fitness_node_parse(arg, tensors) for arg in node.args]), tf.float32)
example run result using logical operators:
iflessthanequal(iflessthanequal(iflessthanequal(iflessthanequal(iflessthanequal(A18x, ifgreaterthan(iflessthanequal(ifgreaterthan(iflessthanequal(xmin, A14x), iflessthanequal(A7, A9x)), A30x), iflessthanequal(iflessthanequal(A22x, A26x), ifgreaterthan(iflessthanequal(A30x, A11x), A3x)))), ifgreaterthan(iflessthanequal(A10x, A26x), iflessthanequal(A10x, A17x))), iflessthanequal(iflessthanequal(ifgreaterthan(A5x, A31x), iflessthanequal(iflessthanequal(ifgreaterthan(ifgreaterthan(A11x, xmin), iflessthanequal(A2x, A9x)), ifgreaterthan(ifgreaterthan(A12x, A16x), iflessthanequal(A20x, A18x))), ifgreaterthan(ifgreaterthan(iflessthanequal(A31x, A17x), iflessthanequal(A20x, A9x)), iflessthanequal(iflessthanequal(A4x, A25x), A30x)))), ifgreaterthan(iflessthanequal(ifgreaterthan(iflessthanequal(ifgreaterthan(A11x, A29x), iflessthanequal(A30x, A6x)), A25x), iflessthanequal(A19x, A30x)), iflessthanequal(A19x, ifgreaterthan(A25x, A19x))))), iflessthanequal(ifgreaterthan(A10x, xrange), iflessthanequal(iflessthanequal(iflessthanequal(xmin, A6x), iflessthanequal(iflessthanequal(ifgreaterthan(iflessthanequal(A27x, A9x), iflessthanequal(A9x, A24x)), ifgreaterthan(A29x, ifgreaterthan(xmax, A14x))), iflessthanequal(iflessthanequal(A19x, iflessthanequal(xrange, A15x)), A11x))), A31x))), ifgreaterthan(iflessthanequal(iflessthanequal(iflessthanequal(iflessthanequal(iflessthanequal(iflessthanequal(A14x, iflessthanequal(A12x, A22x)), iflessthanequal(ifgreaterthan(A14x, A9x), A11x)), iflessthanequal(iflessthanequal(ifgreaterthan(A12x, A16x), ifgreaterthan(A31x, A8x)), iflessthanequal(ifgreaterthan(A17x, A27x), ifgreaterthan(A14x, A6x)))), iflessthanequal(iflessthanequal(ifgreaterthan(iflessthanequal(A11x, A22x), iflessthanequal(A15x, A6x)), A10x), A19x)), ifgreaterthan(ifgreaterthan(iflessthanequal(iflessthanequal(A30x, iflessthanequal(A22x, A26x)), iflessthanequal(ifgreaterthan(A13x, A17x), ifgreaterthan(A17x, A27x))), iflessthanequal(iflessthanequal(iflessthanequal(A8x, A6x), ifgreaterthan(A24x, A11x)), ifgreaterthan(iflessthanequal(A4x, A6x), ifgreaterthan(A22x, xmax)))), iflessthanequal(ifgreaterthan(A31x, iflessthanequal(A27x, A20x)), ifgreaterthan(iflessthanequal(A29x, iflessthanequal(A22x, A15x)), iflessthanequal(ifgreaterthan(A18x, A16x), A1x))))), iflessthanequal(iflessthanequal(iflessthanequal(xmin, A6x), ifgreaterthan(ifgreaterthan(ifgreaterthan(iflessthanequal(A9x, xmin), xmin), iflessthanequal(iflessthanequal(A2x, A12x), A31x)), A31x)), ifgreaterthan(ifgreaterthan(ifgreaterthan(A14x, iflessthanequal(xmin, A22x)), iflessthanequal(A24x, A19x)), iflessthanequal(ifgreaterthan(A11x, iflessthanequal(iflessthanequal(A18x, A10x), A17x)), ifgreaterthan(ifgreaterthan(iflessthanequal(A8x, A6x), xmax), iflessthanequal(A23x, iflessthanequal(A8x, A25x))))))), ifgreaterthan(iflessthanequal(ifgreaterthan(A30x, iflessthanequal(iflessthanequal(ifgreaterthan(A20x, xrange), iflessthanequal(A10x, A7)), ifgreaterthan(A12x, A15x))), ifgreaterthan(iflessthanequal(A11x, A25x), iflessthanequal(ifgreaterthan(iflessthanequal(ifgreaterthan(A26x, A15x), iflessthanequal(A3x, A5x)), iflessthanequal(A7, A9x)), ifgreaterthan(A20x, iflessthanequal(ifgreaterthan(A16x, xrange), iflessthanequal(xrange, xmin)))))), A30x)))
final result confirmed 1 or 0
Pardon me for not using Github the way its supposed to be used
Note that I tested my logical operators on the Iris classification and get 100% on the 6th generation.
Summary:
original Karoo:
New Under Development Karoo:
Thank you for your tedious testing of the revisions to Karoo. We are pleased to have your support. Stay tuned, as there are yet many changes to come, your observations noted and archived.
On 6/20/22 00:34, asksak wrote:
Summary:
original Karoo:
- Fast generation creation
- Very slow crossover
New Under Development Karoo:
- Slow generation creation
- Fast cross over
As part of the
engine-api
PR, there is an option to chooseNumpyEngine
orTensorflowEngine
. This has led to some discussion about what to do wrt tensorflow. There were several ongoing conversations, so I combine them here.Outstanding issues:
- We're using
tensorflow 1
, which is outdated and partially deprecated, so we should update totensorflow 2
.tf
is imported lazily by theTensorflowEngine
because (at least for me) it takes about 2s extra to load. So if you use theNumpyEngine
instead, you can save those 2 seconds by avoiding importingtf
at all. The way I've done it is by copying aLazyLoader
class from tensorflow themselves. This seems inelegant, and also may be sensitive to licensing. Should look for a better solution, like maybe importingtf
inTensorflowEngine.__init__()
.@asksak has asked a few questions about tensorflow:
- KarooGP on Mac with AMD GPU #39 Utilizing AMD GPU. Sounds like you got it to work, anything we should update?
- Tensorflow v2 behaviour #66 Pointed out the limitations of tensorflow v1; noted.
- tf.map_fn in population.py problem #72 Found a problem with
tf.map_fn()
; this is removed inengine-api
and shouldn't be an issue going forward.Finally, it's not obvious to me that tensorflow will ever be faster than numpy for what we're doing. It seems that tensorflow is fast when:
- Working with matrices (2d), while we work with arrays (1d)
- Doing multiplication or dot-products specifically, while we do many different operations
Anyway we should continue to support it, but monitor the performance and make sure users are getting the optimal performance.
Greetings,
would you please supply me with version numbers of Python, Numpy, and Tensorflow you are using in development. I need to figure out a speed problem.
Thank you
Thank you Aymen. Grant or Ezio will respond soon with the version numbers, as requested. As for Numpy vs TF, that will be proven when we return to testing against much larger datasets.
On 6/22/22 02:29, asksak wrote:
As part of the
engine-api
PR, there is an option to chooseNumpyEngine
orTensorflowEngine
. This has led to some discussion about what to do wrt tensorflow. There were several ongoing conversations, so I combine them here.Outstanding issues:
- We're using
tensorflow 1
, which is outdated and partially deprecated, so we should update totensorflow 2
.tf
is imported lazily by theTensorflowEngine
because (at least for me) it takes about 2s extra to load. So if you use theNumpyEngine
instead, you can save those 2 seconds by avoiding importingtf
at all. The way I've done it is by copying aLazyLoader
class from tensorflow themselves. This seems inelegant, and also may be sensitive to licensing. Should look for a better solution, like maybe importingtf
inTensorflowEngine.__init__()
.@asksak has asked a few questions about tensorflow:
- KarooGP on Mac with AMD GPU #39 Utilizing AMD GPU. Sounds like you got it to work, anything we should update?
- Tensorflow v2 behaviour #66 Pointed out the limitations of tensorflow v1; noted.
- tf.map_fn in population.py problem #72 Found a problem with
tf.map_fn()
; this is removed inengine-api
and shouldn't be an issue going forward.Finally, it's not obvious to me that tensorflow will ever be faster than numpy for what we're doing. It seems that tensorflow is fast when:
- Working with matrices (2d), while we work with arrays (1d)
- Doing multiplication or dot-products specifically, while we do many different operations
Anyway we should continue to support it, but monitor the performance and make sure users are getting the optimal performance.
Greetings,
would you please supply me with version numbers of Python, Numpy, and Tensorflow you are using in development. I need to figure out a speed problem.
Thank you
Hi @asksak - thanks for all the feedback! Apologies for the slow response, I've been deep in Karoo and needed to come up for air :)
As you know, currently we're still using tensorflow v1
, just for continuity, and plan to update to v2
soon.
We're using the latest version of Python3
(though any Python3 should work) and the latest numpy
(1.23).
I hope to implement logical operators in the next few days, and your comments above are really helpful for that, so stay tuned.
I did some exploration of how/when tensorflow
beats numpy
expecting nothing, but I found something! Here's the complete notebook, and a summary is below. All of this was run on an Nvidia Tesla P100 GPU running on Google Cloud.
To use Tensorflow 1, you have to open a session, compile a graph of functions, and then execute some data on that graph. In Tensorflow 2, you just put your data in tensors (no session needed), and call functions on those tensors as you would numpy arrays. This this was the headline feature of TF2.
From a big-picture Karoo perspective, this means we don't need to collapse Trees into strings and re-build them in a tensorflow graph; we can execute in-place in the Node
class. This is a huge reduction in complexity.
So for the demo I write a stripped-down version of our Node component with that in-place execution, and the experiments below show that it's taking proper advantage of the GPU. I'll implement this properly in Karoo in a future PR.
I do a basic linear equation (a+a)/(a*a), repeated N times, on two different sized datasets: (100, 100) and (10,000, 100). Numpy is faster for the smaller set, Tensorflow is faster for the larger set.
I generate a population of 10 trees, depth=3, and execute the sample data gens=2 times. I test a range of sample sizes, in binary and decimal. Looks like Tensorflow is generally faster for >30,000 samples (at those other settings).
Node
in-place execution works for Numpy and Tensorflow2, and is a huge reduction in complexity to boot. So I'll implement that in an upcoming PR. That will incidentally be our update from TF1 to TF2.Very well done Grant. Excellent! I had heard about the new functions of TF being even easier to implement, but didn't realize how much they had integrated high-level functions. Incredible.
As for the dataset size, this makes perfect sense. While TF may be easier to call, the reality is that at the hardware level, GPUs must (by the very nature of the hardware) have every register filled with each logical execution, including null values if not used. This process of taking any given mathematical expression and breaking it down into register by register allocation was originally (2010s) done by hand in C. Very taxing. Cafe, Torch, and TF came along and enabled non-uber-geeks to work one level higher. Keras turned TF into a proper Python library. And now, it seems, it is even simpler.
The dataset sizes you describe make sense. This is what Marco and I found a few years ago with my first paper on Karoo, that GPUs suffer a hit in spin-up (what I described above) and prep, while Numpy just gets to work. It will be fun to compare numbers, to see where the cross-over used to be versus now, with this new version.
On 8/20/22 19:56, Grant wrote:
I did some exploration of how/when
tensorflow
beatsnumpy
expecting nothing, but I found something! Here's the complete notebook, and a summary is below. All of this was run on an Nvidia Tesla P100 GPU running on Google Cloud.First off: Eager Execution
In Tensorflow 1, you have to open a session, compile a graph, and then execute the graph. In Tensorflow 2, you just build and call tensors the same way as numpy. This this was the headline feature of TF2.
From a big-picture Karoo perspective, this means we don't need to collapse Trees into strings and re-build them in a tensorflow session; we can execute in-place in the
Node
class. This is a huge reduction in complexity.So for the demo I write a stripped-down version of our Node component with that in-place execution, and the experiments below show that it's taking proper advantage of the GPU. I'll implement this properly in Karoo in a future PR.
Baseline Comparison
I do a basic linear equation (a+a)/(a*a), repeated N times, on two different sized datasets: (100, 100) and (10,000, 100). Numpy is faster for the smaller set, Tensorflow is faster for the larger set.
Karoo Comparison
I generate a population of 10 trees, depth=3, and execute the sample data gens=2 times. I test a range of sample sizes, in binary and decimal. Looks like Tensorflow is generally faster for >30,000 samples (at those other settings).
Conclusion
- Tensorflow and Numpy shine in different settings, so let's include both.
- The
Node
in-place execution works for Numpy and Tensorflow2, and is a huge reduction in complexity to boot. So I'll implement that in an upcoming PR. That will incidentally be our update from TF1 to TF2.
Yes, TF will be faster than Numpy in very large datasets. The original research and paper demonstrated this
With the revised code which replaces arrays with objects, it is likely that TF will not be faster until we reach larger datasets than before. A revised research project would need be conducted to discover the tipping point.
To remove TF without this research would be premature.
On 6/19/22 20:44, Grant wrote:
As part of the
engine-api
PR, there is an option to chooseNumpyEngine
orTensorflowEngine
. This has led to some discussion about what to do wrt tensorflow. There were several ongoing conversations, so I combine them here.Outstanding issues:
- We're using
tensorflow 1
, which is outdated and partially deprecated, so we should update totensorflow 2
.tf
is imported lazily by theTensorflowEngine
because (at least for me) it takes about 2s extra to load. So if you use theNumpyEngine
instead, you can save those 2 seconds by avoiding importingtf
at all. The way I've done it is by copying aLazyLoader
class from tensorflow themselves. This seems inelegant, and also may be sensitive to licensing. Should look for a better solution, like maybe importingtf
inTensorflowEngine.__init__()
.- @asksak has asked a few questions about tensorflow:
39 Utilizing AMD GPU. Sounds like you got it to work, anything we should update?
66 Pointed out the limitations of tensorflow v1; noted.
72 Found a problem with
tf.map_fn()
; this is removed inengine-api
and shouldn't be an issue going forward.Finally, it's not obvious to me that tensorflow will ever be faster than numpy for what we're doing. It seems that tensorflow is fast when:
- Working with matrices (2d), while we work with arrays (1d)
- Doing multiplication or dot-products specifically, while we do many different operations Anyway we should continue to support it, but monitor the performance and make sure users are getting the optimal performance.
As part of the
engine-api
PR, there is an option to chooseNumpyEngine
orTensorflowEngine
. This has led to some discussion about what to do wrt tensorflow. There were several ongoing conversations, so I combine them here.Outstanding issues:
tensorflow 1
, which is outdated and partially deprecated, so we should update totensorflow 2
.tf
is imported lazily by theTensorflowEngine
because (at least for me) it takes about 2s extra to load. So if you use theNumpyEngine
instead, you can save those 2 seconds by avoiding importingtf
at all. The way I've done it is by copying aLazyLoader
class from tensorflow themselves. This seems inelegant, and also may be sensitive to licensing. Should look for a better solution, like maybe importingtf
inTensorflowEngine.__init__()
.39 Utilizing AMD GPU. Sounds like you got it to work, anything we should update?
66 Pointed out the limitations of tensorflow v1; noted.
72 Found a problem with
tf.map_fn()
; this is removed inengine-api
and shouldn't be an issue going forward.Finally, it's not obvious to me that tensorflow will ever be faster than numpy for what we're doing. It seems that tensorflow is fast when:
Anyway we should continue to support it, but monitor the performance and make sure users are getting the optimal performance.