Closed singlecheeze closed 1 year ago
It seems cupy arrays work fine (Maybe this needs to live in cuDF bug tracker?):
import numpy as np
import cupy as cp
from cuml.tsa.arima import ARIMA
for n in [np.NaN, np.nan, np.NAN, cp.nan]:
try:
array = cp.array([n, 1, 2, 3, 4])
model = ARIMA(
array,
order=(1, 1, 1),
# simple_differencing=False
)
print(f"{type(n)}{n} worked!")
except:
print(f"{type(n)}{n} didn't work")
Output:
[W] [23:30:36.795767] Missing observations detected. Forcing simple_differencing=False
<class 'float'>nan worked!
[W] [23:30:36.796889] Missing observations detected. Forcing simple_differencing=False
<class 'float'>nan worked!
[W] [23:30:36.797940] Missing observations detected. Forcing simple_differencing=False
<class 'float'>nan worked!
[W] [23:30:36.798966] Missing observations detected. Forcing simple_differencing=False
<class 'float'>nan worked!
The underlying issue you're hitting is that nulls are not NaNs. In cuDF missing values are by default "null" (like in the new Pandas nullable dtypes). This is common in columnar data representations, but less common in array representations. You will only get a NaN by default with cuDF if you genuinely get a NaN (such as taking the square root of a negative number).
When we call .values
under the hood, we're converting from cuDF to a CuPy array. CuPy doesn't understand nulls, so we prohibit the conversion. Depending on how you're creating your data, you can force the desired behavior with something like:
s = cudf.Series([np.nan, 1, 2, 3, 4], nan_as_null=False)
This parameter is also available in cudf.from_pandas
.
Thank you for the timely response @beckernick !
I'll close this if it will let me and leave this link that might be helpful for others: https://docs.rapids.ai/api/cudf/stable/user_guide/missing-data.html
Docs here state NaN is allowed for input array: https://docs.rapids.ai/api/cuml/stable/api.html#arima
Output:
Traceback in each case: