uber / orbit

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.
https://orbit-ml.readthedocs.io/en/stable/
Other
1.85k stars 132 forks source link

fit method fail for Timezone aware timeseries #871

Closed hdattada closed 1 month ago

hdattada commented 2 months ago

Describe the bug We currently use ETS and DLT forecaster for our timeseries forecasting. When we specify a dataframe with datetime column consisting of timezone aware datetime column. The root cause of the issue is with this line

The numpy.diff returns an array of integers of a naive datetime while it returns an array of TimeDelta object for timezone aware series. Hence its able to cast the diff to float for former and fail for the latter.

Any workaround to get over this issue is appreciated. Thank you!

To Reproduce Steps to reproduce the behavior:

import pandas as pd
from orbit.utils.general import is_ordered_datetime

df_tz_aware = pd.date_range("2021-01-01", periods=5, freq="D", tz="UTC")

print(is_ordered_datetime(df_tz_aware))

Expected behavior A clear and concise description of what you expected to happen. The expected output for the above series is True , while the functions throws the below error

  File "orbit_ets.py", line 37, in orbit_ets_forecast
    ).fit(historic_data_df)
  File "python3.10/site-packages/orbit/forecaster/map.py", line 23, in fit
    super().fit(df, **kwargs)
  File "python3.10/site-packages/orbit/forecaster/forecaster.py", line 143, in fit
    self._validate_training_df(df)
  File "python3.10/site-packages/orbit/forecaster/forecaster.py", line 285, in _validate_training_df
    if not is_ordered_datetime(date_array):
  File "python3.10/site-packages/orbit/utils/general.py", line 18, in is_ordered_datetime
    return np.all(np.diff(array).astype(float) > 0)

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context An issue with Numpy was already raised, the determination was the cast to float for TimeDelta will not work as numpy is unaware of pandas types. https://github.com/numpy/numpy/issues/26838

hdattada commented 1 month ago

Thanks a ton @swotai for fixing this promptly. When can I expect a new release for this fix?