kstaats / karoo_gp

A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.
Other
157 stars 61 forks source link

generation of expression that can't be evaluated #11

Closed minkymorgan closed 5 years ago

minkymorgan commented 6 years ago

I have discovered a small issue while running a classification task using the desktop tool. After having run large jobs for 15hours plus without issue, I decided to expand on the operators I'm using. After having done so, I'm getting an odd error. For some reason, the mutation function created a symbolic function operator called "zoo" which then failed to evaluate, throwing an error.

I managed to run it and get the same error again, and to capture the output. I think all the details to review the bug are below, but I am happy to provide the training file too if needed. A dump of the error is below, then I suggest some things I spotted digging into the code:

 9 trees [ 1  2  8 10 16 28 39 52 97] offer the highest fitness scores.

     (pause) l

     The leading Trees and their associated expressions are:
      1 : T2*T21 + T26*T9
      2 : T2*T21 + T26*T9
      8 : log(T8) + 0.5
      10 : Ez26 + log(Ez16)
      16 : log(Ez11) + log(Ez20) + sign(Ez26) + 0.5
      28 : log(Ez11) + log(Ez20) + sign(Ez7) + sign(T18) + 0.5
      39 : log(Ez11) + log(Ez20) + cos(Ez26)*sign(Ez35)*sign(T7) + sign(T23) + 0.5
      52 : Ez31 + log(Ez20) + sin(Ez11)
      97 : sqrt(Ez31) + log(Ez20) + sin(Ez11) + 0.1

     (pause) 

 Copy gp.population_b to gp.population_a

 Evolve a population of Trees for Generation 4 ...
  Perform 10 Reproductions ...
  Perform 0 Point Mutations ...
  Perform 20 Full or Grow Mutations ...
  Perform 70 Crossovers ...

 Evaluate all Trees in Generation 4
    Tree 1 yields (sym): sqrt(Ez31) + log(Ez20) + sin(Ez11) + 0.1 
    Tree 2 yields (sym): Ez5 + log(Ez25) 
    Tree 3 yields (sym): log(Ez11) + 2*log(Ez20) + sign(Ez11) + sign(Ez7) + 0.5 
    Tree 4 yields (sym): log(Ez20) + 0.5 
    Tree 5 yields (sym): log(Ez11) + log(Ez20) + cos(Ez26)*sign(Ez35)*sign(T7) + sign(T23) + 0.5 
    Tree 6 yields (sym): T9*sign(T26) + log(Ez11) + log(Ez20) + sign(T23) + 0.5 
    Tree 7 yields (sym): log(Ez20) + log(T9) + sign(Ez7) + sign(T23) + 0.5 
    Tree 8 yields (sym): Ez4 + sign(Ez7) + sign(T23) + sin(4)/Ez8 
    Tree 9 yields (sym): Ez31 + log(Ez20) + sin(Ez11) 
    Tree 10 yields (sym): log(Ez20) + log(T9) + sign(Ez7) + sign(T23) + 0.5 
    Tree 11 yields (sym): -Ez17 + 2*log(Ez20) + sign(Ez11) + sign(Ez28) + sign(Ez7) + 0.5 + 10.0*log(T26)/sqrt(Ez10) 
    Tree 12 yields (sym): log(Ez21) + sin(Ez23) + 0.5 
    Tree 13 yields (sym): Ez5 + log(Ez11) + log(Ez20) + sin(T33) + sign(Ez7) + sign(T23) 
    Tree 14 yields (sym): log(Ez11) + log(T10) + cos(Ez26)*sign(Ez35)*sign(T7) + sign(T23) + 0.5 
    Tree 15 yields (sym): Ez5*sign(Ez22) + T16 + log(Ez4)/sign(Ez13) 
    Tree 16 yields (sym): log(Ez11) + log(Ez20) + sign(T14) + 0.5 
    Tree 17 yields (sym): log(Ez11) + log(Ez20) + sign(Ez51) + 0.5 
    Tree 18 yields (sym): Ez24 + log(Ez16) 
    Tree 19 yields (sym): T19 + abs(T7)*sign(Ez7) + log(Ez20) + sign(T23) 
    Tree 20 yields (sym): log(Ez11) + log(Ez20) + cos(Ez26)*sign(T35)*sign(T7) + sign(T23) + 0.5 
    Tree 21 yields (sym): Ez54*log(Ez48)*tan(T30)/sqrt(Ez9) + log(Ez20) + sign(T5) + 0.5 
    Tree 22 yields (sym): abs(Ez5) + zoo*log(Ez12) + log(Ez4) + 0.5 
                                      ^^^^^^^^^^
Traceback (most recent call last):
  File "karoo_gp_main.py", line 251, in <module>
    gp.fx_eval_generation() # evaluate all Trees in a single generation
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1291, in fx_eval_generation
    self.fx_fitness_gym(self.population_b) # run 'fx_eval', 'fx_fitness', 'fx_fitness_store', and fitness record
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1339, in fx_fitness_gym
    result = self.fx_fitness_eval(expr, self.data_train)
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1424, in fx_fitness_eval
    result = self.fx_fitness_expr_parse(expr, tensors)
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1517, in fx_fitness_expr_parse
    return self.fx_fitness_node_parse(tree, tensors)
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1567, in fx_fitness_node_parse
    return operators[type(node.op)](self.fx_fitness_node_parse(node.left, tensors), self.fx_fitness_node_parse(node.right, tensors))
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1567, in fx_fitness_node_parse
    return operators[type(node.op)](self.fx_fitness_node_parse(node.left, tensors), self.fx_fitness_node_parse(node.right, tensors))
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1567, in fx_fitness_node_parse
    return operators[type(node.op)](self.fx_fitness_node_parse(node.left, tensors), self.fx_fitness_node_parse(node.right, tensors))
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1567, in fx_fitness_node_parse
    return operators[type(node.op)](self.fx_fitness_node_parse(node.left, tensors), self.fx_fitness_node_parse(node.right, tensors))
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1560, in fx_fitness_node_parse
    return tensors[node.id]
KeyError: 'zoo'
[andrew@srv02 karoo_gp]$ head -1  files/Karoo_dp_30_train_all_stocks_coef.csv 
T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,T20,T21,T22,T23,T24,T25,T26,T27,T28,T29,T30,T31,T32,T33,T34,T35,Ez1,Ez2,Ez3,Ez4,Ez5,Ez6,Ez7,Ez8,Ez9,Ez10,Ez11,Ez12,Ez13,Ez14,Ez15,Ez16,Ez17,Ez18,Ez19,Ez20,Ez21,Ez22,Ez23,Ez24,Ez25,Ez26,Ez27,Ez28,Ez29,Ez30,Ez31,Ez32,Ez33,Ez34,Ez35,Ez36,Ez37,Ez38,Ez39,Ez40,Ez41,Ez42,Ez43,Ez44,Ez45,Ez46,Ez47,Ez48,Ez49,Ez50,Ez51,Ez52,Ez53,Ez54,0.1,0.2,0.3,0.4,0.5,1,2,3,4,5,s
[andrew@srv02 karoo_gp]$ ls files/
coefficients.csv   data_MATCH.csv  data_REGRESS.csv  Karoo_dp_30_train_all_stocks_coef.csv  operators_MATCH.csv  operators_REGRESS.csv
data_CLASSIFY.csv  data_PLAY.csv   Iris_dataset      operators_CLASSIFY.csv                 operators_PLAY.csv   templates
[andrew@srv02 karoo_gp]$ cat files/operators_CLASSIFY.csv
operator, arity
+,2
-,2
*,2
/,2
+,2
-,2
*,2
/,2
-,2
*,2
/,2
+,2
-,2
*,2
/,2
+ abs,2
- abs,2
* abs,2
/ abs,2
+ log,2
- log,2
* log,2
/ log,2
+ sign,2
- sign,2
* sign,2
/ sign,2
+ sqrt,2
- sqrt,2
* sqrt,2
/ sqrt,2
+ sin,2
- sin,2
* sin,2
/ sin,2
+ tan,2
- tan,2
* tan,2
/ tan,2
+ cos,2
- cos,2
* cos,2
/ cos,2
[andrew@srv02 karoo_gp]$ python karoo_gp_main.py files/Karoo_dp_30_train_a^C
[andrew@srv02 karoo_gp]$ cat karoo_gp_main.py | grep zoo
[andrew@srv02 karoo_gp]$ # 

What I think may be going on, is that the translation of the raw tree of operations into a simpler symbolic representation (which happens in sympy from what I can gather), is evaluating a randomly generated expression as one representing a "complex infinity," which it names as the symbol "zoo".

There are some notes in the sympy docs about zoo here: http://docs.sympy.org/latest/modules/core.html

scroll down till you see:

class sympy.core.numbers.ComplexInfinity[source]
Complex infinity.

In complex analysis the symbol ∞̃ ∞~, called “complex infinity”, represents a quantity with infinite magnitude, but undetermined complex phase.

ComplexInfinity is a singleton, and can be accessed by S.ComplexInfinity, or can be imported as zoo.

See also Infinity

Perhaps simply killing off mutations that are non-viable prior to evaluation could be a quick fix. It would mean filtering out symbolic trees that have strings found in a quarantine list, then running fitness evaluations on the filtered list only. Open to other suggestions too.

A

Andrew

minkymorgan commented 6 years ago

thinking about it, perhaps, another fix could be to remove from the division based operators from the configuration, where they could resolve to division by zero, although that may not be the only source of "zoo" conditions, so might not be a general solution in more complex cases. As an example, I could remove: / cos, 2 / sin, 2 / tan, 2 This might limit the generation of zoo generating functions sufficiently to proceed. I will test trying this out , see if this helps.

minkymorgan commented 6 years ago

I ran a 500 generation test that completed when removing the division based operators for sin/cos/tan etc. Still think it's an edge case that might need some thought.

kstaats commented 6 years ago

Andrew,

On 03/02/2018 11:08 AM, Andrew wrote:

I ran a 500 generation test that completed when removing the division based operators for sin/cos/tan etc. Still think it's an edge case that might need some thought.

Thank you for the continued feedback.

Strange. When Karoo was using Sympy, it happened often with basic arithmetic operators, so I had to manually catch and bypass 'zoo' errors. But I have not seen this occur since we switched Karoo to the Tensorflow maths library. I can reintroduce code that catches this, or as you have implemented, remove the divide by sin, cos, and tan functions. I need to give some more thought as to why this is happening at all ...

Per your first email of this subject, a 15 hours run is substantial! Eager to learn about the problem you are tackling.

Cheers, kai