kstaats / karoo_gp

A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.
Other
157 stars 61 forks source link

Functions that produce nan/inf values #84

Closed granawkins closed 2 years ago

granawkins commented 2 years ago

Some of the operators we support will produce unusable values (nan or inf) in the course of normal use:

Operator X == 0 X > 1e3 X < 0 X > 1
/ nan*
** inf
sqrt nan* nan
log -inf nan
log1p nan
arcsin nan
arccos nan

*We currently use helper functions for division and square root which ignore 0s.

What to do?

Here are 3 ideas:

  1. Deal with them case-by-case.

    • / and sqrt seem ok for now.
    • log1p is a built-in function that extends log by ignoring 0s. We could add a helper which does sign(x) * log1p(abs(x)).
    • arccos and arcsin are maybe rare enough, we could add a check in karoo.fit() when using them that -1 < X < 1, else raise a ValueError.
    • That leaves **. X > 1e3 happens frequently with small numbers too when combined with other operators, e.g. 2 ** (1 / .001). Replacing with 0 is the simplest option, but it's a big nonlinearity (as X increases, outputs get exponentially larger and then drop to 0).
  2. Accept a kwarg with a replacement value (e.g. 0) in the case that a nan and/or inf is produced. Basically like we do in the *'s above, for everything.

  3. If and when a tree produces a nan or inf, just remove it from the gene pool and don't bother scoring it. This is basically the method used by swim, i.e. eliminate trees with less than the minimum number of nodes.

I lean toward 3.

asksak commented 2 years ago

If idea 3 will be used consistently with all cases, then I think it would be best available resolution.

granawkins commented 2 years ago

The best approach seems to be:

kstaats commented 2 years ago

Ok. So removing tree altogether, But I assume adding a new tree to replace each that is removed, so that the populations remain at max.

On 8/19/22 23:22, Grant wrote:

The best approach seems to be:

  • Keep the helper fx for / and sqrt, add one for log: sign(x) * log1p(abs(x))
  • Add an unfit=False attribute to Trees. After predicting each tree, if the output contains nan or inf, set unfit=True.
  • Skip unfit trees when scoring
  • Remove unfit trees from gene_pool
granawkins commented 2 years ago

Every generation starts with tree_pop_max trees, e.g. 100. If 10 are unfit, then the remaining 90 are used to generate the next population of

  1. It's the same approach we use to handle tree_depth_min.

On Sat, Aug 20, 2022 at 1:34 PM Kai Staats @.***> wrote:

Ok. So removing tree altogether, But I assume adding a new tree to replace each that is removed, so that the populations remain at max.

On 8/19/22 23:22, Grant wrote:

The best approach seems to be:

  • Keep the helper fx for / and sqrt, add one for log: sign(x) * log1p(abs(x))
  • Add an unfit=False attribute to Trees. After predicting each tree, if the output contains nan or inf, set unfit=True.
  • Skip unfit trees when scoring
  • Remove unfit trees from gene_pool

— Reply to this email directly, view it on GitHub https://github.com/kstaats/karoo_gp/issues/84#issuecomment-1221243890, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL7VFKZHPSKR3BIUC3RRMZ3V2B355ANCNFSM55X6RGBA . You are receiving this because you authored the thread.Message ID: @.***>

kstaats commented 2 years ago

Let's discuss, as there is another method ...

On 8/20/22 00:34, Grant wrote:

Every generation starts with tree_pop_max trees, e.g. 100. If 10 are unfit, then the remaining 90 are used to generate the next population of

  1. It's the same approach we use to handle tree_depth_min.

On Sat, Aug 20, 2022 at 1:34 PM Kai Staats @.***> wrote:

Ok. So removing tree altogether, But I assume adding a new tree to replace each that is removed, so that the populations remain at max.

On 8/19/22 23:22, Grant wrote:

The best approach seems to be:

  • Keep the helper fx for / and sqrt, add one for log: sign(x) * log1p(abs(x))
  • Add an unfit=False attribute to Trees. After predicting each tree, if the output contains nan or inf, set unfit=True.
  • Skip unfit trees when scoring
  • Remove unfit trees from gene_pool

— Reply to this email directly, view it on GitHub https://github.com/kstaats/karoo_gp/issues/84#issuecomment-1221243890, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL7VFKZHPSKR3BIUC3RRMZ3V2B355ANCNFSM55X6RGBA . You are receiving this because you authored the thread.Message ID: @.***>

granawkins commented 2 years ago

This was implemented in #85

granawkins commented 1 year ago

Can you explain briefly so I can get moving?

On Sun, 21 Aug 2022 at 05:56 Kai Staats @.***> wrote:

Let's discuss, as there is another method ...

On 8/20/22 00:34, Grant wrote:

Every generation starts with tree_pop_max trees, e.g. 100. If 10 are unfit, then the remaining 90 are used to generate the next population of

  1. It's the same approach we use to handle tree_depth_min.

On Sat, Aug 20, 2022 at 1:34 PM Kai Staats @.***> wrote:

Ok. So removing tree altogether, But I assume adding a new tree to replace each that is removed, so that the populations remain at max.

On 8/19/22 23:22, Grant wrote:

The best approach seems to be:

  • Keep the helper fx for / and sqrt, add one for log: sign(x) * log1p(abs(x))
  • Add an unfit=False attribute to Trees. After predicting each tree, if the output contains nan or inf, set unfit=True.
  • Skip unfit trees when scoring
  • Remove unfit trees from gene_pool

— Reply to this email directly, view it on GitHub <https://github.com/kstaats/karoo_gp/issues/84#issuecomment-1221243890 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AL7VFKZHPSKR3BIUC3RRMZ3V2B355ANCNFSM55X6RGBA

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/kstaats/karoo_gp/issues/84#issuecomment-1221422026, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL7VFK63IHJ5GWSWNCH3FI3V2FPDPANCNFSM55X6RGBA . You are receiving this because you authored the thread.Message ID: @.***>