Closed dennisbader closed 1 year ago
Not sure if right place for this, but seems like UserWarning error is being raised even if time_col
is specified in cases where the input df is sorted by time, df[time_col].is_monotic_increasing = True
, even if all series have overlapping time indices. However, when df is not time sorted, the UserWarning is not raised. Shouldn't the UserWarning only be raised when time_col
is not specified, regardless or how df is sorted?
e.g. running sample code from https://unit8co.github.io/darts/examples/15-static-covariates.html, and sorting vs. non-sorting by time leads to warning vs. no warning
df = pd.DataFrame(
data={
"dates": [
"2020-01-01",
"2020-01-02",
"2020-01-03",
"2020-01-01",
"2020-01-02",
"2020-01-03",
],
"comp1": np.random.random((6,)),
"comp2": np.random.random((6,)),
"comp3": np.random.random((6,)),
"ID": ["SERIES1", "SERIES1", "SERIES1", "SERIES2", "SERIES2", "SERIES2"],
"var1": [0.5, 0.5, 0.5, 0.75, 0.75, 0.75],
}
)
print("Input DataFrame")
print(df)
df = df.sort_values(["dates","ID"]) ### <==== Sorting df by time causes UserWarning to be raised
series_multi = TimeSeries.from_group_dataframe(
df,
time_col="dates",
group_cols="ID", # individual time series are extracted by grouping `df` by `group_cols`
static_cols=[
"var1"
], # also extract these additional columns as static covariates (without grouping)
value_cols=[
"comp1",
"comp2",
"comp3",
], # optionally, specify the time varying columns
)
Sorting by time leads to UserWarning: The (time) index from
dfis monotonically increasing. This results in time series groups with non-overlapping (time) index. You can ignore this warning if the index represents the actual index of each individual time series group.
Hi @halstonblim, we do the check regardless of whether you pass time_col
specifically or not.
It's a sanity check where we want to avoid that users run into pitfalls down the line.
We check whether the index is monotonically increasing (e.g. next index must be larger than or equal to the last index, assuming a sorted index) because pandas has a built-in property for this. Better would be to check whether it's strictly monotonically increasing (e.g. next index must be larger (not equal to or smaller) than last value, assuming a sorted index). But pandas doesn't have a property for that, and I don't think it's necessary to add this logic for the sanity check.
Your index is monotonically increasing but not strictly, so you have a valid index. We mention in the warning message that you can ignore it if it's actually a valid index.
I think it's fine to raise this warning but maybe we could improve the message to clarify some things (or add an ignore_warnings
flag to the method). WDYT?
Thanks for clarifying @dennisbader. I think the ignore_warnings flag would be helpful!
Agree strictly monotonically increasing is a bit better. Prob not worth implementing, and I think there would still be the corner case of a single group where you would expect a monotonic time index
See #1606.
In
TimeSeries.from_group_dataframe()
:time_col=None
to avoid downstream issuestime_col=None
, to make user aware