Closed orthosku closed 2 years ago
Hi @orthosku. This seems like you have missing values already in your training data.
Can you check the following on your input TimeSeries (where ts
is any of your TimeSeries)?
ts.pd_dataframe().isna().any()
Ah yes that returned true. I followed it back and it seems to come from when the ts is created.
nan_in_df = SPY.isnull().sum().sum()
print('Number of NaN values present: ' + str(nan_in_df))
Number of NaN values present: 0
print(SPY.head(n=50))
close
timestamp
1999-11-19 142.500000
1999-11-22 142.468704
1999-11-23 141.218704
1999-11-24 141.968704
1999-11-26 141.437500
1999-11-29 140.937500
1999-11-30 139.281204
1999-12-01 140.406204
1999-12-02 141.250000
1999-12-03 143.843704
1999-12-06 142.781204
1999-12-07 141.625000
1999-12-08 140.718704
1999-12-09 141.406204
1999-12-10 141.875000
1999-12-13 142.125000
1999-12-14 140.750000
1999-12-15 141.500000
1999-12-16 142.125000
1999-12-17 142.687500
1999-12-20 141.656204
1999-12-21 143.812500
1999-12-22 144.187500
1999-12-23 146.484299
1999-12-27 146.281204
1999-12-28 146.187500
1999-12-29 146.812500
1999-12-30 146.640594
1999-12-31 146.875000
2000-01-03 145.437500
2000-01-04 139.750000
2000-01-05 140.000000
2000-01-06 137.750000
2000-01-07 145.750000
2000-01-10 146.250000
2000-01-11 144.500000
2000-01-12 143.062500
2000-01-13 145.000000
2000-01-14 146.968704
2000-01-18 145.812500
2000-01-19 147.000000
2000-01-20 144.750000
2000-01-21 144.437500
2000-01-24 140.343704
2000-01-25 141.937500
2000-01-26 140.812500
2000-01-27 140.250000
2000-01-28 135.875000
2000-01-31 139.562500
2000-02-01 140.937500
series = TimeSeries.from_dataframe(SPY, freq='D', fill_missing_dates=False)
print(series.pd_dataframe().isna().any())
component
close True
dtype: bool
series = TimeSeries.from_dataframe(SPY, freq='B', fill_missing_dates=False)
print(series.pd_dataframe().isna().any())
component
close True
dtype: bool
series = TimeSeries.from_dataframe(SPY, freq='B', fill_missing_dates=True)
print(series.pd_dataframe().isna().any())
component
close True
dtype: bool
Above, I try changing the frequency and filling vs not filling the missing dates. Neither seems to solve the issue. Any ideas? I saw the prior thread that mentioned using freq='b' when dealing w business day data.
That is good, and yes, you should use freq='B'
for business day frequency.
Parameter fill_missing_dates
will only insert the missing business days (the dates) as rows with Nan values into your TimeSeries object.
Now to fill the missing values you can take a look at our MissingValuesFiller
(https://unit8co.github.io/darts/generated_api/darts.dataprocessing.transformers.missing_values_filler.html)
I may have found a potential solution:
series = TimeSeries.from_dataframe(SPY, freq='B', fill_missing_dates=False)
series = fill_missing_values(series, fill="auto")
print(series.pd_dataframe().isna().any())
component
close False
dtype: bool
I'm struggling to understand why if fill_missing_dates = False, then how would there have been any missing values to fill? Wouldn't this function only fill values if an index was present with an 'Nan' in the close column?
To copy @hrzn:
Let's say you have daily data with a missing date:
Mon --> 1
Tue --> 2
Thu --> 4
Fri --> 5
Thenfill_missing_dates=True
will insert the date with Nan values (in the columns)
Mon --> 1
Tue --> 2
Wed --> NaN
Thu --> 4
Fri --> 5
Finally by filling the missing values:
Mon --> 1
Tue --> 2
Wed --> 3
Thu --> 4
Fri --> 5
Makes sense, thank you! So in this use case, weekday holidays would marked as Nan by the ts object; these values would be interpolated.
Hello orthosku, I've also been dealing with darts for a few days to predict Stock Prices, would you like to exchange experience, my Discord name is MRV1N#1905
Hi there,
running into trouble with predicting Nan Values. Initially thought that this could be from using a weekday timeseries (working with stock data). I saw the post about changing Freq='B' for a business day time index. Even with doing this, the prediction array still has Nan values. I tried using this data with NBEATs as well as RNN model - both have yielded the same results. Would love some help!
Below is the data I'm using as well as the prediction array readout.