kstaats / karoo_gp

A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.
Other
159 stars 61 forks source link

Square Root Problem #31

Open asksak opened 3 years ago

asksak commented 3 years ago

Hello,

As I was working on Karoo, and when testing I observed this:

SQUARE ROOT causes the runs to become very slow, and the more iterations are run the slower Karoo became until I got memory error. I googled the issue and it seems its a sympy issue with sqrt.

I used 50 as population and (as I modified the code) I used 2000 as iterations. the program ran very fast and completed without sqrt. However with sqrt, it was slow, then became unresponsive then gave a memory error.

FYI,

Aymen

kstaats commented 3 years ago

Thank you Aymen. I was not aware of this problem. Much appreciated.

By "2000 iterations" do you mean generations?

kai

On 6/20/21 11:26 PM, asksak wrote:

Hello,

As I was working on Karoo, and when testing I observed this:

SQUARE ROOT causes the runs to become very slow, and the more iterations are run the slower Karoo became until I got memory error. I googled the issue and it seems its a sympy issue with sqrt.

I used 50 as population and (as I modified the code) I used 2000 as iterations. the program ran very fast and completed without sqrt. However with sqrt, it was slow, then became unresponsive then gave a memory error.

FYI,

Aymen

asksak commented 3 years ago

Thank you Aymen. I was not aware of this problem. Much appreciated. By "2000 iterations" do you mean generations? kai On 6/20/21 11:26 PM, asksak wrote: Hello, As I was working on Karoo, and when testing I observed this: SQUARE ROOT causes the runs to become very slow, and the more iterations are run the slower Karoo became until I got memory error. I googled the issue and it seems its a sympy issue with sqrt. I used 50 as population and (as I modified the code) I used 2000 as iterations. the program ran very fast and completed without sqrt. However with sqrt, it was slow, then became unresponsive then gave a memory error. FYI, Aymen

Yes iterations is generations, I modified the code to accept 10K generations.

kstaats commented 3 years ago

Wow. I have never needed to go beyond 50 generations. I am very curious as to the kind of data you are processing, and how your outcome varies after 50, 100, 1000, etc. Have you plotted the fitness function vs the generations? Are you overfitting?

On 7/11/21 1:11 AM, asksak wrote:

Thank you Aymen. I was not aware of this problem. Much appreciated.
By "2000 iterations" do you mean generations? kai
… <#>
On 6/20/21 11:26 PM, asksak wrote: Hello, As I was working on Karoo,
and when testing I observed this: SQUARE ROOT causes the runs to
become very slow, and the more iterations are run the slower Karoo
became until I got memory error. I googled the issue and it seems
its a sympy issue with sqrt. I used 50 as population and (as I
modified the code) I used 2000 as iterations. the program ran very
fast and completed without sqrt. However with sqrt, it was slow,
then became unresponsive then gave a memory error. FYI, Aymen

Yes iterations is generations, I modified the code to accept 10K generations.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kstaats/karoo_gp/issues/31#issuecomment-877759732, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUKG7TMEIORAYPLWPOVYT3TXFG2BANCNFSM47A5RZXQ.

asksak commented 3 years ago

I'll try to make a function to write generation vs fitness. we'll find out.

asksak commented 3 years ago

Wow. I have never needed to go beyond 50 generations. I am very curious as to the kind of data you are processing, and how your outcome varies after 50, 100, 1000, etc. Have you plotted the fitness function vs the generations? Are you overfitting?

Hello Kai,

I tried 50, 100, 200 and the results were:

for 50 gens: best Classification fitness score: 264 for 100 gens: best Classification fitness score: 284.0 for 200 gens: best Classification fitness score: 379.0

To verify I tested other sets( 4 sets ) of data than the one used for the run, I used excel and inserted the equations there:

for 50 gens: prediction accuracy: >50% and less than 60% for 100 gens: Classification fitness score: >50% and less than 65% for 200 gens: Classification fitness score: >70% and less than 80%

I understand the possibility of overfitting, however, GP depends on RANDOM mutations, and I modified the the probability of mutation in the code to also be random within a certain range for each generation. This dynamic mutation method is much more effective than static mutation values.

But YES, when listing the best trees from each run, they must be all taken into consideration as some freak (high fitness) trees just classify everything as 1 or 0 when tested with other sets in excel (that confused me as I am sure I ported the tree equations correctly to excel).

Best Regards,

Aymen

kstaats commented 2 years ago

Aymen,

On 8/2/21 12:46 AM, asksak wrote:

Wow. I have never needed to go beyond 50 generations. I am very curious as to the kind of data you are processing, and how your outcome varies after 50, 100, 1000, etc. Have you plotted the fitness function vs the generations? Are you overfitting?

I apologize once again for my delayed response to your good questions. I always appreciate your using Karoo, and hope it is serving you well.

Clearly, your numbers show a continued increase in performance with additional generations. This is exciting, as it shows the evolutionary process to be truly working. Yes, I am/was concerned for over-fitting, but as the manager of this program you will know when you take that final multivariate equation and apply it against real-world data. If it works, great!

FYI, Karoo will soon undergo a massive rebuild with much higher performance and revised functions. Stay tuned!

Cheers, kai

Hello Kai,

I tried 50, 100, 200 and the results were:

for 50 gens: best Classification fitness score: 264 for 100 gens: best Classification fitness score: 284.0 for 200 gens: best Classification fitness score: 379.0

To verify I tested other sets( 4 sets ) of data than the one used for the run, I used excel and inserted the equations there:

for 50 gens: prediction accuracy: >50% and less than 60% for 100 gens: Classification fitness score: >50% and less than 65% for 200 gens: Classification fitness score: >70% and less than 80%

I understand the possibility of overfitting, however, GP depends on RANDOM mutations, and I modified the probability of mutation in the code to also be random within a certain range for each generation. This dynamic mutation method is much more effective than static mutation values.

But YES, when listing the best trees from each run, they must be all taken into consideration as some freak (high fitness) trees just classify everything as 1 or 0 when tested with other sets in excel (that confused me as I am sure I ported the tree equations correctly to excel).

Best Regards,

Aymen

granawkins commented 2 years ago

@asksak Have you tried sqrt in the newest release? Let me know if it's working, and I'll close this issue.