jdb78 / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.87k stars 611 forks source link

How to format a pandas dataframe #123

Closed andrewcztrack closed 3 years ago

andrewcztrack commented 3 years ago

Hi @jdb78 !! love the library :).

I noticed the time series in the nbeats example are stacked on top of each other.

how do a format a pandas data frame to look the same. would i need to normalise my data first? -

' ' ' python import yfinance as yf data = yf.download("SPY IBM AMZN AAPL", start="2017-01-01", end="2017-04-30")

data['Close'] '''

andrewcztrack commented 3 years ago

Hi @jdb78 my attempt -


import os
import warnings

warnings.filterwarnings("ignore")

os.chdir("../../..")

import pandas as pd
import torch
import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping

from pytorch_forecasting import TimeSeriesDataSet, NBeats, Baseline
from pytorch_forecasting.data import NaNLabelEncoder
from pytorch_forecasting.data.examples import generate_ar_data
from pytorch_forecasting.metrics import SMAPE
import yfinance as yf
data1 = yf.download("SPY IBM AMZN AAPL", start="2017-01-01", end="2017-04-30")
dd["static"] = 2
dd["date"] = pd.Timestamp("2017-01-03") 
dd.head()
training = TimeSeriesDataSet(
    data[lambda x: x.date <= training_cutoff],
    time_idx="date",
    target="SPY",
    # only unknown variable is "value" - and N-Beats can also not take any additional variables
    time_varying_unknown_reals=["SPY"],
    max_encoder_length=context_length,
    max_prediction_length=prediction_length,
)

Is getting the below error.

 if should_extension_dispatch(lvalues, rvalues):
229         # Call the method on lvalues

--> 230 res_values = op(lvalues, rvalues) 231 232 elif is_scalar(rvalues) and isna(rvalues):

~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/ops/common.py in new_method(self, other) 63 other = item_from_zerodim(other) 64 ---> 65 return method(self, other) 66 67 return new_method

~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/arrays/datetimelike.py in wrapper(self, other) 116 other = _validate_comparison_value(self, other) 117 except InvalidComparison: --> 118 return invalid_comparison(self, other, op) 119 120 dtype = getattr(other, "dtype", None)

~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/ops/invalid.py in invalid_comparison(left, right, op) 32 else: 33 typ = type(right).name ---> 34 raise TypeError(f"Invalid comparison between dtype={left.dtype} and {typ}") 35 return res_values 36

TypeError: Invalid comparison between dtype=datetime64[ns] and int64

jdb78 commented 3 years ago

Could you post the full traceback?

andrewcztrack commented 3 years ago

*100%***] 4 of 4 completed

InvalidComparison Traceback (most recent call last) ~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/arrays/datetimelike.py in wrapper(self, other) 115 try: --> 116 other = _validate_comparison_value(self, other) 117 except InvalidComparison:

~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/arrays/datetimelike.py in _validate_comparison_value(self, other) 95 elif not is_list_like(other): ---> 96 raise InvalidComparison(other) 97

InvalidComparison: 379

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)

in 21 dd.head() 22 training = TimeSeriesDataSet( ---> 23 data[lambda x: x.date <= training_cutoff], 24 time_idx="date", 25 target="SPY", ~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key) 2869 def __getitem__(self, key): 2870 key = lib.item_from_zerodim(key) -> 2871 key = com.apply_if_callable(key, self) 2872 2873 if is_hashable(key): ~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/common.py in apply_if_callable(maybe_callable, obj, **kwargs) 339 """ 340 if callable(maybe_callable): --> 341 return maybe_callable(obj, **kwargs) 342 343 return maybe_callable in (x) 21 dd.head() 22 training = TimeSeriesDataSet( ---> 23 data[lambda x: x.date <= training_cutoff], 24 time_idx="date", 25 target="SPY", ~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/ops/common.py in new_method(self, other) 63 other = item_from_zerodim(other) 64 ---> 65 return method(self, other) 66 67 return new_method ~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/ops/__init__.py in wrapper(self, other) 368 rvalues = extract_array(other, extract_numpy=True) 369 --> 370 res_values = comparison_op(lvalues, rvalues, op) 371 372 return self._construct_result(res_values, name=res_name) ~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/ops/array_ops.py in comparison_op(left, right, op) 228 if should_extension_dispatch(lvalues, rvalues): 229 # Call the method on lvalues --> 230 res_values = op(lvalues, rvalues) 231 232 elif is_scalar(rvalues) and isna(rvalues): ~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/ops/common.py in new_method(self, other) 63 other = item_from_zerodim(other) 64 ---> 65 return method(self, other) 66 67 return new_method ~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/arrays/datetimelike.py in wrapper(self, other) 116 other = _validate_comparison_value(self, other) 117 except InvalidComparison: --> 118 return invalid_comparison(self, other, op) 119 120 dtype = getattr(other, "dtype", None) ~/miniconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/ops/invalid.py in invalid_comparison(left, right, op) 32 else: 33 typ = type(right).__name__ ---> 34 raise TypeError(f"Invalid comparison between dtype={left.dtype} and {typ}") 35 return res_values 36 TypeError: Invalid comparison between dtype=datetime64[ns] and int64
andrewcztrack commented 3 years ago

Thank you @jdb78

jdb78 commented 3 years ago

Look like this problem is due to the pandas dataframe column "date" having a different type than your variable training_cutoff. The filter will also not work because you set date to a constant.