kernc / backtesting.py

:mag_right: :chart_with_upwards_trend: :snake: :moneybag: Backtest trading strategies in Python.
https://kernc.github.io/backtesting.py/
GNU Affero General Public License v3.0
5.05k stars 989 forks source link

Plot resample=True: DataError: No numeric types to aggregate #309

Open Haakam21 opened 3 years ago

Haakam21 commented 3 years ago

Expected Behavior

No errors, plot shows.

Actual Behavior

/usr/local/lib/python3.7/dist-packages/backtesting/_plotting.py:104: UserWarning: Data contains too many candlesticks to plot; downsampling to '15T'. See `Backtest.plot(resample=...)`
  warnings.warn(f"Data contains too many candlesticks to plot; downsampling to {freq!r}. "
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-25-5a9b4a5c2a1c> in <module>()
----> 1 bt.plot(resample=True)

11 frames
/usr/local/lib/python3.7/dist-packages/backtesting/backtesting.py in plot(self, results, filename, plot_width, plot_equity, plot_return, plot_pl, plot_volume, plot_drawdown, smooth_equity, relative_equity, superimpose, resample, reverse_indicators, show_legend, open_browser)
   1719             reverse_indicators=reverse_indicators,
   1720             show_legend=show_legend,
-> 1721             open_browser=open_browser)

/usr/local/lib/python3.7/dist-packages/backtesting/_plotting.py in plot(results, df, indicators, filename, plot_width, plot_equity, plot_return, plot_pl, plot_volume, plot_drawdown, smooth_equity, relative_equity, superimpose, resample, reverse_indicators, show_legend, open_browser)
    184     if is_datetime_index:
    185         df, indicators, equity_data, trades = _maybe_resample_data(
--> 186             resample, df, indicators, equity_data, trades)
    187 
    188     df.index.name = None  # Provides source name @index

/usr/local/lib/python3.7/dist-packages/backtesting/_plotting.py in _maybe_resample_data(resample_rule, df, indicators, equity_data, trades)
    113                                     # HACK: override `data` for its index
    114                                     data=pd.Series(np.nan, index=df.index)))
--> 115                   for i in indicators]
    116     assert not indicators or indicators[0].df.index.equals(df.index)
    117 

/usr/local/lib/python3.7/dist-packages/backtesting/_plotting.py in <listcomp>(.0)
    113                                     # HACK: override `data` for its index
    114                                     data=pd.Series(np.nan, index=df.index)))
--> 115                   for i in indicators]
    116     assert not indicators or indicators[0].df.index.equals(df.index)
    117 

/usr/local/lib/python3.7/dist-packages/pandas/core/resample.py in g(self, _method, *args, **kwargs)
    935     def g(self, _method=method, *args, **kwargs):
    936         nv.validate_resampler_func(_method, args, kwargs)
--> 937         return self._downsample(_method)
    938 
    939     g.__doc__ = getattr(GroupBy, method).__doc__

/usr/local/lib/python3.7/dist-packages/pandas/core/resample.py in _downsample(self, how, **kwargs)
   1041         # we are downsampling
   1042         # we want to call the actual grouper method here
-> 1043         result = obj.groupby(self.grouper, axis=self.axis).aggregate(how, **kwargs)
   1044 
   1045         result = self._apply_loffset(result)

/usr/local/lib/python3.7/dist-packages/pandas/core/groupby/generic.py in aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
    949         func = maybe_mangle_lambdas(func)
    950 
--> 951         result, how = self._aggregate(func, *args, **kwargs)
    952         if how is None:
    953             return result

/usr/local/lib/python3.7/dist-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
    305 
    306         if isinstance(arg, str):
--> 307             return self._try_aggregate_string_function(arg, *args, **kwargs), None
    308 
    309         if isinstance(arg, dict):

/usr/local/lib/python3.7/dist-packages/pandas/core/base.py in _try_aggregate_string_function(self, arg, *args, **kwargs)
    261         if f is not None:
    262             if callable(f):
--> 263                 return f(*args, **kwargs)
    264 
    265             # people may try to aggregate on a non-callable attribute

/usr/local/lib/python3.7/dist-packages/pandas/core/groupby/groupby.py in mean(self, numeric_only)
   1396             "mean",
   1397             alt=lambda x, axis: Series(x).mean(numeric_only=numeric_only),
-> 1398             numeric_only=numeric_only,
   1399         )
   1400 

/usr/local/lib/python3.7/dist-packages/pandas/core/groupby/generic.py in _cython_agg_general(self, how, alt, numeric_only, min_count)
   1020     ) -> DataFrame:
   1021         agg_blocks, agg_items = self._cython_agg_blocks(
-> 1022             how, alt=alt, numeric_only=numeric_only, min_count=min_count
   1023         )
   1024         return self._wrap_agged_blocks(agg_blocks, items=agg_items)

/usr/local/lib/python3.7/dist-packages/pandas/core/groupby/generic.py in _cython_agg_blocks(self, how, alt, numeric_only, min_count)
   1128 
   1129         if not (agg_blocks or split_frames):
-> 1130             raise DataError("No numeric types to aggregate")
   1131 
   1132         if split_items:

DataError: No numeric types to aggregate

Steps to Reproduce

Dataframe:

                                Low      High      Open     Close    Volume
time                                                                       
2021-01-01 00:00:00-05:00  29265.74  29299.56  29292.74  29287.87  0.949564
2021-01-01 00:01:00-05:00  29266.47  29288.03  29284.16  29266.47  0.567982
2021-01-01 00:02:00-05:00  29233.22  29265.01  29265.01  29234.09  0.886784
2021-01-01 00:03:00-05:00  29210.85  29250.01  29234.08  29210.86  0.601799
2021-01-01 00:04:00-05:00  29199.03  29239.87  29199.03  29239.24  0.275364
...                             ...       ...       ...       ...       ...
2021-03-31 23:55:00-04:00  59023.33  59054.50  59033.91  59042.72  0.934562
2021-03-31 23:56:00-04:00  59042.71  59049.49  59042.72  59048.45  0.027263
2021-03-31 23:57:00-04:00  59023.34  59065.00  59042.72  59058.80  0.318494
2021-03-31 23:58:00-04:00  59065.00  59141.90  59065.00  59088.48  0.362681
2021-03-31 23:59:00-04:00  59088.48  59135.54  59088.48  59135.54  0.104990

[129330 rows x 5 columns]

Code:

bt = Backtest(data, MyStrategy, cash=1000000, commission=0.001)
stats = bt.run()
bt.plot(resample=True)

Additional info

kernc commented 3 years ago

This breaks in .mean(): https://github.com/kernc/backtesting.py/blob/e3cccdfc0fdf8f6f1fe0b5c7b84fd5a2b15fab2d/backtesting/_plotting.py#L113-L118 I'm assuming, is your strategy creating an indicator that is not numeric? Such as:

        self.I(np.random.choice, ['a', 'b'], len(self.data))
Haakam21 commented 3 years ago

In some sense you are correct. I am using Tulip and for lagging indicators, it returns shorter arrays. To circumvent this, I pad the array with None values. This causes the plotting error when resampling. So the fix is to use 0 or some numerical value for padding.

kernc commented 3 years ago

So the fix is to use 0 or some numerical value for padding.

The standard for numeric arrays is NaN: np.nan.


I think we might want to catch/mitigate this.

Haakam21 commented 3 years ago

Oh ok. Yes, Nan is better. Sorry for preemptively closing the issue.