trevorstephens / gplearn

Genetic Programming in Python, with a scikit-learn inspired API
http://gplearn.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.59k stars 281 forks source link

Crossover chance is higher for non-terminals #126

Closed hwulfmeyer closed 5 years ago

hwulfmeyer commented 5 years ago

https://github.com/trevorstephens/gplearn/blob/07b41a150dbf7e16645268b88405514b7f23590a/gplearn/_program.py#L501-L504

Is there a particular reason it was implemented in such a way? I tested around a bit and noticed that when terminals and non-terminals have the same chance of being a crossover point that it results in worse programs. Perhaps this is related to #123, because with each crossover with a terminal we lose a high chunk of genetic material from our gene pool.

I am simply curious if there is any particular reason behind doing it this way and if there might be another way of achieving a similar goal.

trevorstephens commented 5 years ago

Wow, that's some magic code right there @wulfihm :-D I'll have to dig through my source texts from years ago to find out why I did that! It certainly deserves a comment in the code.

I am guessing this is a bloat prevention measure, but leave it with me and I'll try to justify the decision.

jmmcd commented 5 years ago

"GP researchers and practitioners almost universally use a 90%-function/10%-terminal crossover-point selection policy." -- Dignum & Poli, GECCO 2007, "Generalisation of the Limiting Distribution of Program Sizes in Tree-based Genetic Programming and Analysis of its Effects on Bloat".

trevorstephens commented 5 years ago

Thanks for digging that up @jmmcd !

Helped me find another reference from the field guide: https://cswww.essex.ac.uk/staff/rpoli/gp-field-guide/24RecombinationandMutation.html#7_4

trevorstephens commented 5 years ago

I should add a comment for this in the code :-)