Closed catherinening closed 4 months ago
Hi @catherinening Thank you for raising this issue with detailed description. Could you please include a minimal toy/synthetic dataset that triggers the same issue, so I can reproduce and debug this? Thank you.
Hi @ourownstory , here is a small synthetic dataset that triggers the error I described.
in the meantime, are there other ways I can calculate prediction error?
@catherinening Please excuse the late follow-up. Did you find a solution to this? I suspect, that some of your series still had no or insufficient observations in the training data after the split, and thus got omitted by the model. They may however be present in the test dataframe, leading to an issue. It should however fail with a clear message. Might you have a full trace of the calls leading to this pandas error?
In the meantime, you can screen your training dataframe and remove all series with insufficient samples from there and from the test dataframe. If that does not resolve it, you could call test() iteratively for each series until you catch the error, then you know which one to further investigate.
@MaiBe-ctrl Do you mind checking if you get the same error with the dummy dataset?
This happens when the test dataframe is too small, the inferred frequency is then set to NaT. Increasing the split quota from 0.33 to 0.4 solves the problem. @ourownstory we ca solve this issue by raising an exception in case the test dataframe is too small to infer the frequency, what do you think?
Prerequisites
Describe the bug
I am trying to build a global or global/local model with monthly time series data, based on this tutorial, with dates at the start of each month ranging up to three years. There are ~200 series (of subscriber cohorts), and each series ranges from 1 to 36 observations, but the most recent observation is the same date across all series.
I am repeatedly getting a
ValueError: Invalid frequency: NaT
when runningNeuralProphet().test
on the test data set, obtained after running theNeuralProphet.split_df()
method. I initially got theNote: I had run into this error earlier, when trying to split the dataset when calling
NeuralProphet().split_df(df, freq='MS', local_split=True)
. I was able to resolve the below issue by NOT converting my 'ds' column in my DaraFrame to pd.datetime before passing it into split_data(), and also removing series with very few (<5) samples, so that the number of training samples is guaranteed to be > 1.To Reproduce
Steps to reproduce the behavior:
Start with a df, containing ~200 series, each corresponding to subscribers that signed up in the same month. Observations are collected monthly at the start of each month, ranging from July 2020 - June 2023. The response variable is the number of remaining subscribers ('y') at each date ('ds') Some time series will have all 36 observations, but some may have as few as one, reflecting new groups of subscribers. However, as noted above, I removed any series with five or fewer observations.
I initiated the NeuralProphet instance, split the data into training and test, fit the model, and made predictions. I try to make predictions on the test set using m.test(), and that's when I get the error
After running
test_metrics_local = m.test(df_test)
, this is the full error message I get:5 frames /usr/local/lib/python3.10/dist-packages/pandas/core/arrays/datetimes.py in _generate_range(cls, start, end, periods, freq, tz, normalize, ambiguous, nonexistent, inclusive, unit) 419 "and freq, exactly three must be specified" 420 ) --> 421 freq = to_offset(freq) 422 423 if start is not None:
offsets.pyx in pandas._libs.tslibs.offsets.to_offset()
offsets.pyx in pandas._libs.tslibs.offsets.to_offset()
ValueError: Invalid frequency: NaT
if "google.colab" in str(get_ipython()):
uninstall preinstalled packages from Colab to avoid conflicts
!pip install -U kaleido