Open asksak opened 3 years ago
Thank you Aymen. I was not aware of this problem. Much appreciated.
By "2000 iterations" do you mean generations?
kai
On 6/20/21 11:26 PM, asksak wrote:
Hello,
As I was working on Karoo, and when testing I observed this:
SQUARE ROOT causes the runs to become very slow, and the more iterations are run the slower Karoo became until I got memory error. I googled the issue and it seems its a sympy issue with sqrt.
I used 50 as population and (as I modified the code) I used 2000 as iterations. the program ran very fast and completed without sqrt. However with sqrt, it was slow, then became unresponsive then gave a memory error.
FYI,
Aymen
Thank you Aymen. I was not aware of this problem. Much appreciated. By "2000 iterations" do you mean generations? kai … On 6/20/21 11:26 PM, asksak wrote: Hello, As I was working on Karoo, and when testing I observed this: SQUARE ROOT causes the runs to become very slow, and the more iterations are run the slower Karoo became until I got memory error. I googled the issue and it seems its a sympy issue with sqrt. I used 50 as population and (as I modified the code) I used 2000 as iterations. the program ran very fast and completed without sqrt. However with sqrt, it was slow, then became unresponsive then gave a memory error. FYI, Aymen
Yes iterations is generations, I modified the code to accept 10K generations.
Wow. I have never needed to go beyond 50 generations. I am very curious as to the kind of data you are processing, and how your outcome varies after 50, 100, 1000, etc. Have you plotted the fitness function vs the generations? Are you overfitting?
On 7/11/21 1:11 AM, asksak wrote:
Thank you Aymen. I was not aware of this problem. Much appreciated. By "2000 iterations" do you mean generations? kai … <#> On 6/20/21 11:26 PM, asksak wrote: Hello, As I was working on Karoo, and when testing I observed this: SQUARE ROOT causes the runs to become very slow, and the more iterations are run the slower Karoo became until I got memory error. I googled the issue and it seems its a sympy issue with sqrt. I used 50 as population and (as I modified the code) I used 2000 as iterations. the program ran very fast and completed without sqrt. However with sqrt, it was slow, then became unresponsive then gave a memory error. FYI, Aymen
Yes iterations is generations, I modified the code to accept 10K generations.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kstaats/karoo_gp/issues/31#issuecomment-877759732, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUKG7TMEIORAYPLWPOVYT3TXFG2BANCNFSM47A5RZXQ.
I'll try to make a function to write generation vs fitness. we'll find out.
Wow. I have never needed to go beyond 50 generations. I am very curious as to the kind of data you are processing, and how your outcome varies after 50, 100, 1000, etc. Have you plotted the fitness function vs the generations? Are you overfitting? …
Hello Kai,
I tried 50, 100, 200 and the results were:
for 50 gens: best Classification fitness score: 264 for 100 gens: best Classification fitness score: 284.0 for 200 gens: best Classification fitness score: 379.0
To verify I tested other sets( 4 sets ) of data than the one used for the run, I used excel and inserted the equations there:
for 50 gens: prediction accuracy: >50% and less than 60% for 100 gens: Classification fitness score: >50% and less than 65% for 200 gens: Classification fitness score: >70% and less than 80%
I understand the possibility of overfitting, however, GP depends on RANDOM mutations, and I modified the the probability of mutation in the code to also be random within a certain range for each generation. This dynamic mutation method is much more effective than static mutation values.
But YES, when listing the best trees from each run, they must be all taken into consideration as some freak (high fitness) trees just classify everything as 1 or 0 when tested with other sets in excel (that confused me as I am sure I ported the tree equations correctly to excel).
Best Regards,
Aymen
Aymen,
On 8/2/21 12:46 AM, asksak wrote:
Wow. I have never needed to go beyond 50 generations. I am very curious as to the kind of data you are processing, and how your outcome varies after 50, 100, 1000, etc. Have you plotted the fitness function vs the generations? Are you overfitting? …
I apologize once again for my delayed response to your good questions. I always appreciate your using Karoo, and hope it is serving you well.
Clearly, your numbers show a continued increase in performance with additional generations. This is exciting, as it shows the evolutionary process to be truly working. Yes, I am/was concerned for over-fitting, but as the manager of this program you will know when you take that final multivariate equation and apply it against real-world data. If it works, great!
FYI, Karoo will soon undergo a massive rebuild with much higher performance and revised functions. Stay tuned!
Cheers, kai
Hello Kai,
I tried 50, 100, 200 and the results were:
for 50 gens: best Classification fitness score: 264 for 100 gens: best Classification fitness score: 284.0 for 200 gens: best Classification fitness score: 379.0
To verify I tested other sets( 4 sets ) of data than the one used for the run, I used excel and inserted the equations there:
for 50 gens: prediction accuracy: >50% and less than 60% for 100 gens: Classification fitness score: >50% and less than 65% for 200 gens: Classification fitness score: >70% and less than 80%
I understand the possibility of overfitting, however, GP depends on RANDOM mutations, and I modified the probability of mutation in the code to also be random within a certain range for each generation. This dynamic mutation method is much more effective than static mutation values.
But YES, when listing the best trees from each run, they must be all taken into consideration as some freak (high fitness) trees just classify everything as 1 or 0 when tested with other sets in excel (that confused me as I am sure I ported the tree equations correctly to excel).
Best Regards,
Aymen
@asksak Have you tried sqrt
in the newest release? Let me know if it's working, and I'll close this issue.
Hello,
As I was working on Karoo, and when testing I observed this:
SQUARE ROOT causes the runs to become very slow, and the more iterations are run the slower Karoo became until I got memory error. I googled the issue and it seems its a sympy issue with sqrt.
I used 50 as population and (as I modified the code) I used 2000 as iterations. the program ran very fast and completed without sqrt. However with sqrt, it was slow, then became unresponsive then gave a memory error.
FYI,
Aymen