Different data sets than yours or shorter timespan of DE

luboshanus commented 1 year ago

Hi,

I am trying to run your code with my own data set and it always produces errors. I have data with Date in correct format as index, Price column, and some exogenous. If I run function to prepare date, it prepares it well, then at some point is always stops.

A bit annoyed after some time, I wanted to just to try your data with different timespan. I made a .csv from DE, starting at 2014-11-16 00:00:00 and ending at 2016-03-29 23:00:00, having 12000 observations, it is divisible by 24. Named it DE.csv and run: nohup python3 examples/recalibrating_lear_simplified.py > 01_log.txt &

The error now is this

2023-04-26 13:32:07.633251: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/Users/lubos/My Drive/Projects/epftoolbox/examples/recalibrating_lear_simplified.py", line 36, in <module>
    evaluate_lear_in_test_dataset(path_recalibration_folder=path_recalibration_folder, 
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/epftoolbox/models/_lear.py", line 419, in evaluate_lear_in_test_dataset
    Yp = model.recalibrate_and_forecast_next_day(df=data_available, next_day_date=date, 
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/epftoolbox/models/_lear.py", line 324, in recalibrate_and_forecast_next_day
    Xtrain, Ytrain, Xtest, = self._build_and_split_XYs(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/epftoolbox/models/_lear.py", line 165, in _build_and_split_XYs
    if df_train.index[0].hour != 0 or df_test.index[0].hour != 0:
       ~~~~~~~~~~~~~~^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 5174, in __getitem__
    return getitem(key)
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/core/arrays/datetimelike.py", line 370, in __getitem__
    "Union[DatetimeLikeArrayT, DTScalarOrNaT]", super().__getitem__(key)
                                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/core/arrays/_mixins.py", line 272, in __getitem__
    result = self._ndarray[key]
             ~~~~~~~~~~~~~^^^^^
IndexError: index 0 is out of bounds for axis 0 with size 0

Comparison, how the new (cropped) DE.csv looks like, DE_orig.csv is yours.

If you could check this problem above and help, it would be appreciated.

Thank you.

Sidenote: With my data, or even yours, there are errors as NaN - probably because of scalers. Or that some date+hour does not exist in the dataset, and when looking into xyz.csv it is present in the data but test set is probably wrongly defined, even when begin and end of test set is left None.

jeslago commented 1 year ago

Can you share the exact script that you are testing? The error is just indicating (I think) that the train dataset is empty. Can you share the file that you are giving to the model and the exact code that you are running in recalibrating_lear_simplified.py

jeslago commented 5 months ago

Closing due to inactivity

jeslago / epftoolbox

Different data sets than yours or shorter timespan of DE #17