Currently, having any NaN values in the numpy array leads to the following error when trying to build a RCTree:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-55-8f8be9d6cf46> in <module>
----> 1 tree = rrcf.RCTree(data_anom.sample(1000, random_state=111).to_numpy())
~\anaconda3\envs\squarefeetenv\lib\site-packages\rrcf\rrcf.py in __init__(self, X, index_labels, precision, random_state)
104 # Create RRC Tree
105 S = np.ones(n, dtype=np.bool)
--> 106 self._mktree(X, S, N, I, parent=self)
107 # Remove parent of root
108 self.root.u = None
~\anaconda3\envs\squarefeetenv\lib\site-packages\rrcf\rrcf.py in _mktree(self, X, S, N, I, parent, side, depth)
170 depth += 1
171 # Create a cut according to definition 1
--> 172 S1, S2, branch = self._cut(X, S, parent=parent, side=side)
173 # If S1 does not contain an isolated point...
174 if S1.sum() > 1:
~\anaconda3\envs\squarefeetenv\lib\site-packages\rrcf\rrcf.py in _cut(self, X, S, parent, side)
152 l /= l.sum()
153 # Determine dimension to cut
--> 154 q = self.rng.choice(self.ndim, p=l)
155 # Determine value for split
156 p = self.rng.uniform(xmin[q], xmax[q])
mtrand.pyx in numpy.random.mtrand.RandomState.choice()
ValueError: probabilities contain NaN
Filling NaNs with mean or median column values is probably the best way to handle this so perhaps having it as a built-in option would be helpful. Maybe it could be an optional parameter during the creation of a RCTree with the default handling set to None?
Currently, having any NaN values in the numpy array leads to the following error when trying to build a RCTree:
Filling NaNs with mean or median column values is probably the best way to handle this so perhaps having it as a built-in option would be helpful. Maybe it could be an optional parameter during the creation of a RCTree with the default handling set to None?