jankrepl / deepdow

Portfolio optimization with deep learning.
https://deepdow.readthedocs.io
Apache License 2.0
874 stars 136 forks source link

raw_to_Xy raises KeyError #118

Closed AlexKnowsIt closed 2 years ago

AlexKnowsIt commented 2 years ago

Hello, I am running into a weird error, that I don't understand. I used Yahoo Finance to download a DataFrame with the Index of MultiIndex([( 'SIE.DE', 'Open'), ( 'SIE.DE', 'High'), ( 'SIE.DE', 'Low'), ( 'SIE.DE', 'Close'), ( 'SIE.DE', 'Volume'), ('EOAN.DE', 'Open'), ('EOAN.DE', 'High'), ('EOAN.DE', 'Low'), ('EOAN.DE', 'Close'), ('EOAN.DE', 'Volume'), ... ('BAYN.DE', 'Open'), ('BAYN.DE', 'High'), ('BAYN.DE', 'Low'), ('BAYN.DE', 'Close'), ('BAYN.DE', 'Volume'), ( 'BEI.DE', 'Open'), ( 'BEI.DE', 'High'), ( 'BEI.DE', 'Low'), ( 'BEI.DE', 'Close'), ( 'BEI.DE', 'Volume')], length=150) So I have a DataFrame, where there are the Assets in the zero level and the first level being indecators. If i run now the raw_to_Xy Function like this

n_timesteps = len(raw_df)
n_channels = len(raw_df.columns.levels[0]) 
n_assets = len(raw_df.columns.levels[1]) 

lookback, gap, horizon = 14, 1, 2

X, timestamps, y, asset_names, indicators = raw_to_Xy(raw_df,
                                                      lookback=lookback,
                                                      gap=gap,
                                                      freq="B",
                                                      horizon=horizon)

I receive an KeyError: 'Open' with the explanation

The label is present in self.levels[level] but unused

I tried to stay as close to the documentation as possible and don't see, where I am wrong. Other than that: Great repository, I am really fascinated by all the possibilties that this gives for modern Portfolio Optimization!

AlexKnowsIt commented 2 years ago

This is the whole Error-message:

/usr/local/lib/python3.7/dist-packages/deepdow/utils.py in raw_to_Xy(raw_data, lookback, horizon, gap, freq, included_assets, included_indicators, use_log)
    275     y_list = []
    276     for ind in indicators:
--> 277         X, timestamps, y = (returns_to_Xy(returns.xs(ind, axis=1, level=1),
    278                                           lookback=lookback,
    279                                           horizon=horizon,

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   3471             if not isinstance(labels, MultiIndex):
   3472                 raise TypeError("Index must be a MultiIndex")
-> 3473             loc, new_ax = labels.get_loc_level(key, level=level, drop_level=drop_level)
   3474 
   3475             # create the tuple of the indexer

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/multi.py in get_loc_level(self, key, level, drop_level)
   2883                 return indexer, maybe_mi_droplevels(indexer, ilevels, drop_level)
   2884         else:
-> 2885             indexer = self._get_level_indexer(key, level=level)
   2886             return indexer, maybe_mi_droplevels(indexer, [level], drop_level)
   2887 

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/multi.py in _get_level_indexer(self, key, level, indexer)
   2974                 if not locs.any():
   2975                     # The label is present in self.levels[level] but unused:
-> 2976                     raise KeyError(key)
   2977                 return locs
   2978 

KeyError: 'Open'
jankrepl commented 2 years ago

Hey! Thank you for your interest!

Hmmm, I am not sure where it comes from. Could you please upload your raw_df here? Ideally just a minimal version of it that reproduces the error.

Cheers

AlexKnowsIt commented 2 years ago

Hello Jan,

thanks for helping me out. I am not sure if this is the best way to share my DataFrame, if you need it in a different format just tell me. Its the result of the yfinance package with the function

mein_portfolio = yf.download(tickers = ticker,
          start="2010-01-01", 
          end="2021-01-01",
          interval = "1d",

          # group by ticker (to access via data['SPY'])
          # (optional, default is 'column')
          group_by = 'ticker',
      )
<!--StartFragment-->

  | 1COV.DE | MRK.DE | LIN.DE | FRE.DE | DWNI.DE | BEI.DE | MUV2.DE | DPW.DE | ... | BAS.DE | VOW3.DE | SIE.DE | CON.DE | VNA.DE | RWE.DE | HEN3.DE | ALV.DE
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --

NaN | NaN | NaN | NaN | NaN | 27.215351 | 27.428233 | 27.077607 | 27.219526 | 752504.0 | 39.533226 | 39.533226 | 39.533226 | 39.533226 | 0.0 | 12.068662 | 12.510301 | 12.068662 | 12.393932 | 96522.0 | 4.828712 | 5.049168 | 4.800266 | 5.006499 | 193278.0 | 39.338344 | 39.998988 | 39.295446 | 39.848843 | 336203.0 | 62.419827 | 62.733922 | 61.677412 | 61.848740 | 873193.0 | 8.497876 | 8.767550 | 8.497876 | 8.767550 | 3020790.0 | ... | 25.048560 | 25.849699 | 24.985161 | 25.849699 | 3245626.0 | 49.865665 | 50.309594 | 48.564486 | 49.368153 | 906136.0 | 41.763412 | 42.234259 | 41.666662 | 42.227810 | 2734897.0 | 27.314565 | 27.993757 | 27.171385 | 27.993757 | 133257.0 | NaN | NaN | NaN | NaN | NaN | 39.725997 | 40.033593 | 39.638945 | 40.021984 | 2022078.0 | 30.165420 | 30.165420 | 29.641154 | 29.842794 | 784200.0 | 50.129575 | 50.751911 | 50.015382 | 50.552078 | 1676777.0
NaN | NaN | NaN | NaN | NaN | 27.169440 | 27.202834 | 26.873077 | 27.019175 | 488178.0 | NaN | NaN | NaN | NaN | NaN | 12.339255 | 12.410758 | 12.050434 | 12.109323 | 137034.0 | 5.120282 | 5.153706 | 5.042056 | 5.042056 | 279601.0 | 39.831689 | 40.024737 | 39.522816 | 39.638645 | 331772.0 | 61.763061 | 62.020048 | 61.563181 | 61.962940 | 797939.0 | 8.739330 | 8.864759 | 8.673479 | 8.780094 | 2735783.0 | ... | 25.800709 | 25.820882 | 25.293508 | 25.457775 | 3292371.0 | 49.368148 | 49.666657 | 47.592428 | 48.411404 | 893352.0 | 42.182657 | 42.498704 | 41.911759 | 42.253605 | 2584717.0 | 27.894637 | 30.185536 | 27.843238 | 29.744978 | 420128.0 | NaN | NaN | NaN | NaN | NaN | 40.045200 | 40.213507 | 39.540283 | 39.766628 | 2040667.0 | 29.850860 | 29.875059 | 29.003970 | 29.318529 | 1147700.0 | 50.529229 | 51.208663 | 50.415040 | 50.706223 | 1742206.0
NaN | NaN | NaN | NaN | NaN | 27.048393 | 27.219531 | 26.772899 | 27.002478 | 511138.0 | 39.589531 | 39.589531 | 39.589531 | 39.589531 | 0.0 | 12.222885 | 12.222885 | 11.964912 | 12.046229 | 77268.0 | 5.034945 | 5.148729 | 5.034945 | 5.034945 | 98025.0 | 39.681540 | 39.810238 | 39.595743 | 39.608612 | 342784.0 | 62.105725 | 62.191389 | 61.363310 | 61.563194 | 727120.0 | 8.767549 | 8.924336 | 8.733056 | 8.877300 | 2595800.0 | ... | 25.486592 | 25.754599 | 25.379965 | 25.619154 | 2345128.0 | 48.671639 | 49.896274 | 47.929201 | 49.750847 | 1202268.0 | 42.214906 | 42.479355 | 42.079458 | 42.408405 | 2166413.0 | 29.744973 | 30.086398 | 29.047423 | 29.939554 | 231260.0 | NaN | NaN | NaN | NaN | NaN | 39.714390 | 39.934928 | 39.464832 | 39.691177 | 1535087.0 | 29.439508 | 29.737935 | 29.286262 | 29.576622 | 993200.0 | 50.900349 | 51.265758 | 50.671968 | 51.100182 | 1362139.0
NaN | NaN | NaN | NaN | NaN | 26.960732 | 27.624421 | 26.960732 | 27.570158 | 1088580.0 | 39.617683 | 39.617683 | 39.617683 | 39.617683 | 0.0 | 12.057447 | 12.079879 | 11.917244 | 12.029407 | 74370.0 | 5.034944 | 5.089703 | 4.999387 | 5.047034 | 115634.0 | 39.625772 | 39.681540 | 38.960837 | 39.209652 | 481551.0 | 61.334759 | 61.763075 | 61.134880 | 61.705967 | 948916.0 | 8.849081 | 8.902388 | 8.792638 | 8.858488 | 2907679.0 | ... | 25.457778 | 25.578815 | 25.218591 | 25.449133 | 2505352.0 | 49.727887 | 50.753518 | 49.314570 | 50.324898 | 1108024.0 | 42.376153 | 43.085651 | 42.034307 | 42.911499 | 3336865.0 | 31.279587 | 33.915587 | 31.261229 | 33.776077 | 1470941.0 | NaN | NaN | NaN | NaN | NaN | 39.551892 | 39.685378 | 39.197872 | 39.522873 | 1765257.0 | 29.383050 | 29.471772 | 28.778129 | 29.124950 | 877300.0 | 50.769027 | 51.065924 | 50.443584 | 50.512100 | 1926360.0
NaN | NaN | NaN | NaN | NaN | 27.570161 | 27.707905 | 26.964909 | 27.290491 | 730274.0 | NaN | NaN | NaN | NaN | NaN | 12.141570 | 12.149982 | 11.777042 | 11.777042 | 74403.0 | 5.042056 | 5.081880 | 5.034944 | 5.056279 | 90482.0 | 39.252552 | 39.484207 | 37.871205 | 38.077122 | 978115.0 | 62.020063 | 62.020063 | 61.020660 | 61.591747 | 902151.0 | 8.867895 | 9.109347 | 8.817723 | 9.068583 | 5462896.0 | ... | 25.503885 | 25.610512 | 25.126371 | 25.371323 | 3307737.0 | 50.439708 | 51.358185 | 50.294281 | 50.868328 | 937837.0 | 43.117893 | 43.362992 | 42.685749 | 43.182396 | 2634386.0 | 33.812793 | 34.756320 | 32.718743 | 34.400204 | 449586.0 | NaN | NaN | NaN | NaN | NaN | 39.673771 | 39.784040 | 39.041174 | 39.464836 | 1665122.0 | 29.116886 | 29.358857 | 28.447441 | 28.818459 | 1224500.0 | 50.917475 | 50.917475 | 49.490096 | 50.238041 | 2425290.0
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ...
48.683556 | 49.280001 | 48.556444 | 48.800888 | 544896.0 | 134.279385 | 135.566712 | 134.031820 | 134.972565 | 193144.0 | 204.061152 | 206.339508 | 203.367744 | 206.339508 | 751699.0 | 36.296987 | 37.071978 | 36.159648 | 36.738438 | 1330030.0 | 41.357513 | 41.710410 | 41.220276 | 41.543766 | 569336.0 | 90.810540 | 92.596589 | 90.592251 | 92.358452 | 327184.0 | 227.299843 | 229.993200 | 226.626507 | 228.165558 | 310667.0 | 39.068311 | 39.233857 | 38.834605 | 38.951458 | 2387989.0 | ... | 60.852591 | 61.519835 | 60.661949 | 60.862122 | 2923353.0 | 146.430482 | 147.387668 | 145.023997 | 145.434219 | 952157.0 | 111.475366 | 111.884555 | 110.695958 | 111.397423 | 1390621.0 | 117.800003 | 118.900002 | 117.500000 | 118.400002 | 325628.0 | 55.863411 | 56.387859 | 55.669170 | 56.154770 | 1031102.0 | 31.953356 | 32.382001 | 31.846195 | 32.372257 | 1600706.0 | 87.669159 | 88.728963 | 87.551408 | 88.454201 | 329138.0 | 186.491139 | 188.844513 | 185.936275 | 186.759003 | 969429.0
48.986666 | 49.553779 | 48.986666 | 49.240891 | 615538.0 | 134.972562 | 136.012339 | 134.378399 | 135.071579 | 202684.0 | 207.726336 | 209.113168 | 206.042339 | 208.617874 | 549094.0 | 36.895399 | 37.699818 | 36.846350 | 37.483997 | 1204434.0 | 41.465345 | 41.877059 | 41.435938 | 41.739822 | 380069.0 | 92.517205 | 93.053019 | 92.159995 | 92.457672 | 231334.0 | 228.454137 | 233.071319 | 228.357940 | 232.590363 | 246062.0 | 39.146216 | 39.185168 | 38.883293 | 38.951458 | 2036098.0 | ... | 60.976510 | 61.939249 | 60.966976 | 61.662819 | 1802073.0 | 147.250934 | 149.575542 | 146.801649 | 148.911377 | 944403.0 | 111.904042 | 113.988968 | 111.670221 | 113.988968 | 1149343.0 | 118.949997 | 122.199997 | 118.550003 | 121.650002 | 471544.0 | 56.154770 | 56.873459 | 56.154770 | 56.659794 | 603944.0 | 32.537871 | 33.044445 | 32.440450 | 32.927544 | 1523201.0 | 88.552333 | 88.846725 | 87.963555 | 88.787849 | 212154.0 | 186.567662 | 191.666657 | 186.548525 | 191.217026 | 987162.0
49.788442 | 49.827554 | 49.201777 | 49.299553 | 439192.0 | 135.319144 | 137.497730 | 135.319144 | 136.210388 | 155267.0 | 211.193396 | 213.967046 | 209.212220 | 211.688690 | 692211.0 | 37.866585 | 38.298227 | 37.837156 | 37.915638 | 968546.0 | 42.151532 | 42.680877 | 41.779029 | 42.396599 | 366515.0 | 93.072874 | 94.422332 | 92.933960 | 94.104805 | 190535.0 | 235.764677 | 236.245633 | 232.975123 | 234.802765 | 225114.0 | 40.022620 | 40.158953 | 39.691533 | 39.964195 | 2223149.0 | ... | 62.148946 | 62.673212 | 62.120351 | 62.168015 | 2008445.0 | 152.329905 | 152.408030 | 149.204376 | 149.204376 | 686060.0 | 115.352942 | 116.327207 | 115.274999 | 116.229782 | 1067229.0 | 123.000000 | 124.150002 | 122.400002 | 123.050003 | 321404.0 | 57.028850 | 57.514450 | 56.601520 | 57.456177 | 602391.0 | 33.122383 | 34.116055 | 33.122383 | 34.116055 | 1845090.0 | 89.474755 | 90.671940 | 89.298121 | 90.299042 | 353813.0 | 193.053823 | 195.062798 | 192.814658 | 193.819153 | 738979.0
49.573336 | 50.003557 | 49.475556 | 49.592892 | 662659.0 | 137.151130 | 138.042358 | 136.507466 | 136.903564 | 151454.0 | 213.967040 | 214.066105 | 210.400916 | 211.193390 | 508977.0 | 38.131456 | 38.200126 | 37.582098 | 37.601719 | 864433.0 | 42.553442 | 43.082788 | 42.426007 | 42.710285 | 457634.0 | 94.700160 | 95.176435 | 94.263568 | 94.660469 | 193822.0 | 235.668482 | 237.592307 | 233.937037 | 234.514191 | 258539.0 | 40.266066 | 40.314758 | 39.516253 | 39.662319 | 1880277.0 | ... | 62.663677 | 62.844789 | 61.824860 | 62.025032 | 2112445.0 | 150.024838 | 151.079687 | 147.797908 | 148.364395 | 738894.0 | 116.327207 | 116.658453 | 114.144856 | 114.378677 | 902680.0 | 123.800003 | 123.949997 | 122.050003 | 122.750000 | 388663.0 | 57.611568 | 58.388527 | 57.261936 | 57.825233 | 668538.0 | 34.193985 | 34.505725 | 33.960182 | 34.047859 | 1677246.0 | 90.671937 | 91.084080 | 90.338291 | 90.730812 | 245532.0 | 194.536643 | 195.445466 | 192.814654 | 193.197311 | 665400.0
49.475555 | 49.768892 | 49.358223 | 49.358223 | 339126.0 | 137.497732 | 138.983124 | 136.953086 | 138.983124 | 166548.0 | 210.995285 | 211.193400 | 209.410338 | 210.995285 | 340269.0 | 37.523235 | 37.572284 | 36.856155 | 37.121025 | 1072853.0 | 42.690678 | 42.994563 | 42.641666 | 42.827915 | 266469.0 | 94.541393 | 94.660466 | 93.707909 | 93.707909 | 121069.0 | 234.321809 | 236.245634 | 233.552277 | 233.552277 | 165429.0 | 39.633109 | 39.788914 | 39.428614 | 39.438351 | 1227197.0 | ... | 62.339593 | 62.339593 | 61.548432 | 61.691414 | 1144797.0 | 148.090917 | 148.872299 | 146.489090 | 148.872299 | 578887.0 | 114.144854 | 115.274997 | 114.086393 | 114.495583 | 796354.0 | 122.400002 | 126.500000 | 121.250000 | 121.250000 | 464126.0 | 57.786384 | 58.116592 | 57.669841 | 58.038895 | 505759.0 | 33.969921 | 34.155019 | 33.599733 | 33.677666 | 1036749.0 | 90.887829 | 91.064463 | 90.338297 | 90.573814 | 159727.0 | 192.527646 | 194.967129 | 192.001480 | 192.001480 | 526249.0

<p>2792 rows × 150 columns</p><!--EndFragment-->
jankrepl commented 2 years ago

So the problem is that raw_to_Xy does not support zero (and negative) entries inside of raw_df. Unfortunately, your DataFrame does have a couple of zero entries for the volumes (which I guess is not impossible).

https://github.com/jankrepl/deepdow/blob/eb6c85845c45f89e0743b8e8c29ddb69cb78da4f/deepdow/utils.py#L263

The reason why 0's are not supported is because one then computes per day percentage changes which would lead to division by zero.

I guess the simplest way around this is to replace 0's with NAN's and that way the raw_to_Xy will simply forward fill those values.

from deepdow.utils import raw_to_Xy
import numpy as np
import yfinance as yf

tickers = [
    "1COV.DE",
    "MRK.DE",
    "LIN.DE",
    "FRE.DE",
    "DWNI.DE",
    "BEI.DE",
    "MUV2.DE",
    "DPW.DE",
    "BAS.DE",
    "VOW3.DE",
    "SIE.DE",
    "CON.DE",
    "VNA.DE",
    "RWE.DE",
    "HEN3.DE",
    "ALV.DE",
]

raw_df = yf.download(tickers = tickers,
          start="2010-01-01",
          end="2021-01-01",
          interval = "1d",
          group_by = 'ticker',
      )

print((raw_df == 0).sum().sum())
raw_df_adj = raw_df.replace(0, np.nan)
print((raw_df_adj == 0).sum().sum())

lookback, gap, horizon = 5, 2, 4

X, timestamps, y, asset_names, indicators = raw_to_Xy(raw_df_adj,
                                                      lookback=lookback,
                                                      gap=gap,
                                                      freq="B",
                                                      horizon=horizon)

Standard output:

[*********************100%***********************]  16 of 16 completed
223
0

Anyway, maybe deepdow should raise an exception to let the user know that none of the assets made it through the check. The exception you saw in your traceback is coming from pandas because things do not work out evenutally (asset_names=[] in the code).

AlexKnowsIt commented 2 years ago

Ah, that make sense! The code is running now, thanks! Another short question because it seems my problems are directly related to #30: My n_sample size is not n_timesteps - lookback - horizon - gap + 1, it seems my vector from the raw_to_df function is significantly larger (2776 vs 2851). Is that due to the forward and backward fill of the gaps in my dataset?

jankrepl commented 2 years ago

Ah, that make sense! The code is running now, thanks! Another short question because it seems my problems are directly related to #30: My n_sample size is not n_timesteps - lookback - horizon - gap + 1, it seems my vector from the raw_to_df function is significantly larger (2776 vs 2851). Is that due to the forward and backward fill of the gaps in my dataset?

Yes, exactly! The assert in the docs actually only holds for the data in the docs, not in general:) It is actually also related to this https://github.com/jankrepl/deepdow/issues/71#issuecomment-713786289.

AlexKnowsIt commented 2 years ago

Perfect! Thanks a lot for your help!