sisl / ExprOptimization.jl

Algorithms for optimization of Julia expressions
Other
44 stars 11 forks source link

Genetic Programming Addition Vs Genetic Algorithm #9

Closed flare9x closed 6 years ago

flare9x commented 6 years ago

Be interesting to see a genetic program which have variable length versus the algorithm which has a fixed length. During evolution, genetic programs can increase in complexity. However this would be very interesting and the SO post does a decent job of explaining the differences:

https://stackoverflow.com/questions/3819977/what-are-the-differences-between-genetic-algorithms-and-genetic-programming

https://stackoverflow.com/questions/5732917/code-generation-by-genetic-algorithms

and

Chapter 8 A Genetic Programming Tutorial John R. Koza1 and Riccardo Poli2 1 Stanford University, Stanford, California

http://www.genetic-programming.com/jkpdf/burke2003tutorial.pdf

Would make an great addition!

rcnlee commented 6 years ago

Hi @flare9x,

ExprOptimization.jl is focused on evolving programs from a grammar. Since grammars induce trees, the search algorithm has to search over trees. GP is specifically designed to evolve trees (crossover and mutations are defined on trees).

Genetic algorithms evolve linear encodings, so it doesn't work naturally with the expression trees from the grammar. However, there are ways to adapt it to make it work. Grammatical Evolution actually uses Genetic Algorithm as its backend by defining a linear encoding that converts a tree into a string. See:

O’Neil, Michael, and Conor Ryan. "Grammatical evolution." Grammatical Evolution. Springer, Boston, MA, 2003. 33-47.

Trees are variable-length though, so there are tricks to convert the fixed-length strings of GA into variable-length trees.

flare9x commented 6 years ago

Ok awesome I think as a tree node grows in complexity as let's say the length is 2 at the beginning where 2 represents two different input variables. Half way through evolution a tree node is 15 length where now its 15 input variables.

A user defined complexity penalty would be interesting. Or in general if the same fitness obtained with a length of 2 or 2 inputs and matches a result of 15 length. Then it would favor less complex models over the complex.

Or theres a program maximum length setting so that can control the complexity during evolution.

I feel a control measure might be useful do that evolutions do not make overly complex models. Option to turn off though as perhaps same problems require this feature. I'd like to be humble about potential solutions!

rcnlee commented 6 years ago

Yes, it's definitely a good idea to regulate the size of the expressions. After all, shorter expressions are preferable to longer expressions given the same performance.

The best way to implement this is in the fitness function. You can add a term to the fitness function that penalizes the size of the expression tree. See the approx_pi.ipnyb example: https://github.com/sisl/ExprOptimization.jl/blob/master/examples/approx_pi.ipynb

Notice that in the fitness function, there is an additional term that penalizes the size of the expression tree. Actually, this is the reason behind passing the tree into the fitness function instead of just the compiled expression -- because that way you can penalize how you want. You can penalize the number of nodes in the tree, or just the number of leaves, or number of occurrences of a certain symbol, etc.

flare9x commented 6 years ago

Ok this makes sense. What type of tweaking can be done to convert to variable-length trees?

rcnlee commented 6 years ago

I'm not sure what you mean. The expression trees can vary in size. You can use Base.length to get the number of nodes.

flare9x commented 6 years ago

K understood. Initially I was looking at evolving the solution and the initial input structure.

Let's say it was 2 expressions

(One 2) + (two 2)

That's how it begins and during evolution this structure is growing and shrinking.

Perhaps it becomes or 4 expressions

(One 2) + (two 2) / sin(One) * (two - one)

It may grow to 15. So essentially the formulas as above grow and shrink during the evolutionary process.

rcnlee commented 6 years ago

That's roughly how it works. Different algorithms have different approaches. Most of the algorithms in the package are population-based. So a set of expressions are evolved and the best ones are evolved further. See:

https://dces.essex.ac.uk/staff/rpoli/gp-field-guide/A_Field_Guide_to_Genetic_Programming.pdf