Solution to avoid dividing by zero when substructing two Feature Names

trevorstephens / gplearn

Genetic Programming in Python, with a scikit-learn inspired API

http://gplearn.readthedocs.io/

BSD 3-Clause "New" or "Revised" License

1.56k stars 274 forks source link

Solution to avoid dividing by zero when substructing two Feature Names #278

Closed Konparginos closed 1 year ago

Konparginos commented 1 year ago

Hi Trevor and thanks again for the incredible work with gplearn package. I had an issue that I think I've solved but I'm writing it here since it might be good to integrate it in a future update of the package:

So let's say that gp output for a Symbolic Regression with 3 variables is: X0/(X1-X2) in that case there is no error, but what if instead the output is: X0/(X1-X1) on the current version there is no way to prevent this for happening despite that the output is a prohibited division with 0.

Thus I'm proposing to add the following function:

def _protected_sub(x1, x2):
   if x1.all()!=x2.all():
        return np.subtract(x1, x2)
    else:
        return 0

Let me know if there is a more efficient way to fix this.

trevorstephens commented 1 year ago

The base package division opertor is protected aginst division by zero already so there is no issue needing correction for in that regard. x1 - x1 isn't an issue in itself unless encountering another operator that does have issues with zeros, in which case it is that operator that needs protection, not the subtraction.

Konparginos commented 1 year ago

Unfortunately, I've encountered multiple times instances of: X0/(X1-X1) as final result, thus I'm not sure if the division operator looks at (X1-X1) as 0 or as a non-zero variables combination. Also I'm trying to understand in the source code from where operators receive the input nodes X1,X2 and if there is a way to simplify them so instead of having a tree: add(X0,add(X0,sub(X0,X0)) to be able to translate it as a simple tree: mul(2,X0).

Thanks!

trevorstephens commented 1 year ago

The division protection doesn't care how a zero got in there, it evaluates all entries in the denominator vector and protects accordingly https://github.com/trevorstephens/gplearn/blob/main/gplearn/functions.py#L124