kernc / backtesting.py

:mag_right: :chart_with_upwards_trend: :snake: :moneybag: Backtest trading strategies in Python.
https://kernc.github.io/backtesting.py/
GNU Affero General Public License v3.0
5.5k stars 1.07k forks source link

plot html from 15m, warning Length of values (2) does not match length of index (1) #649

Open hundan2020 opened 2 years ago

hundan2020 commented 2 years ago

Expected Behavior

expect draw a plot html

(resample param is True by default, same code is works fine when using 1day data, seems like it is because there is too many data?)

Actual Behavior

D:\Users\MECHREVO\PycharmProjects\backtesting.py\backtesting\_plotting.py:122: UserWarning: Data contains too many candlesticks to plot; downsampling to '8H'. See `Backtest.plot(resample=...)`
  warnings.warn(f"Data contains too many candlesticks to plot; downsampling to {freq!r}. "
Traceback (most recent call last):
  File "D:\Users\MECHREVO\AppData\Local\Programs\Python\Python37\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "C:\Program Files\JetBrains\PyCharm 2021.3.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm 2021.3.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/Users/MECHREVO/PycharmProjects/backtesting.py/main.py", line 30, in <module>
    bt.plot()
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\backtesting\backtesting.py", line 1609, in plot
    open_browser=open_browser)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\backtesting\_plotting.py", line 204, in plot
    resample, df, indicators, equity_data, trades)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\backtesting\_plotting.py", line 158, in _maybe_resample_data
    ExitBar=_group_trades('ExitTime'),
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\resample.py", line 335, in aggregate
    result = ResamplerWindowApply(self, func, args=args, kwargs=kwargs).agg()
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\apply.py", line 161, in agg
    return self.agg_dict_like()
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\apply.py", line 436, in agg_dict_like
    key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\apply.py", line 436, in <dictcomp>
    key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\groupby\generic.py", line 265, in aggregate
    return self._python_agg_general(func, *args, **kwargs)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1332, in _python_agg_general
    result = self.grouper.agg_series(obj, f)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\groupby\ops.py", line 1060, in agg_series
    result = self._aggregate_series_fast(obj, func)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\groupby\ops.py", line 1283, in _aggregate_series_fast
    result, _ = sbg.get_result()
  File "pandas\_libs\reduction.pyx", line 184, in pandas._libs.reduction.SeriesBinGrouper.get_result
  File "pandas\_libs\reduction.pyx", line 88, in pandas._libs.reduction._BaseGrouper._apply_to_group
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1318, in <lambda>
    f = lambda x: func(x, *args, **kwargs)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\backtesting\_plotting.py", line 147, in f
    mean_time = int(bars.loc[s.index].view(int).mean())
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\series.py", line 801, in view
    self._values.view(dtype), index=self.index
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\series.py", line 428, in __init__
    com.require_length_match(data, index)
  File "D:\Users\MECHREVO\PycharmProjects\backtesting.py\venv\lib\site-packages\pandas\core\common.py", line 532, in require_length_match
    "Length of values "
ValueError: Length of values (2) does not match length of index (1)

Steps to Reproduce

ETHUSDT15m = _read_file('ETHUSDT-15m.csv')

main.py

from backtesting import Backtest, Strategy
from backtesting.lib import crossover

from backtesting.test import SMA,ETHUSDT15m

class SmaCross(Strategy):
    def init(self):
        price = self.data.Close
        self.ma1 = self.I(SMA, price, 10)
        self.ma2 = self.I(SMA, price, 20)

    def next(self):
        reverse = not self.data.High.max(initial=0) > 65000
        if crossover(self.ma1, self.ma2):
            if reverse:
                self.buy()
            else:
                self.sell()
        elif crossover(self.ma2, self.ma1):
            if reverse:
                self.sell()
            else:
                self.buy()

bt = Backtest(ETHUSDT15m, SmaCross, cash=5000, commission=0.02, margin=1 / 125, exclusive_orders=True)
stats = bt.run()
bt.plot()
print(stats)

ETHUSDT-15m.csv ETHUSDT-1d.csv

Additional info

image

Cloblak commented 2 years ago

You can install backtesting=0.3.2, and it will plot after some warning. It has something to do with the way backtesting 0.3.3 resamples when there are to many data points.

calad0i commented 2 years ago

Experiencing the same issue with 0.33. It seems that this bug is located somewhere in the resampling functions, as it is only triggered only when the resample=True flag is taking effect (i.m., # entries > 10000 by default). When forcing resampling with a string, it will always be triggered no matter the number of entries.

zha0yangchen commented 2 years ago

Experiencing the same issue with 0.32 and 0.33

casper-hansen commented 2 years ago

I thought I was the only one having this problem. Just set resample=False and it is fixed, but then you cannot use resampling for your plots.

preritdas commented 2 years ago

Any update on this issue? Resampling doesn't seem to be working on the current build, and I wasn't able to diagnose the issue. I'm not even sure what the root cause is... 🤷‍♂️

tani3010 commented 2 years ago

As my debug, _group_trades inside _maybe_resample_data didn't work correctly because error happened below aggregation.

https://github.com/kernc/backtesting.py/blob/65f54f6819cac5f36fd94ebf0377644c62b4ee3d/backtesting/_plotting.py#L143-L159

By the way, why do we need another aggregation for EntryBar/ExitBar? In my impression TRADES_AGG already has it and we can simply use it. so can we remove these two lines? or am I missing something? My version was 0.3.3

TRADES_AGG = OrderedDict((
    ('Size', 'sum'),
    ('EntryBar', 'first'),
    ('ExitBar', 'last'),
    ('EntryPrice', 'mean'),
    ('ExitPrice', 'mean'),
    ('PnL', 'sum'),
    ('ReturnPct', 'mean'),
    ('EntryTime', 'first'),
    ('ExitTime', 'last'),
    ('Duration', 'sum'),
))
liushihao456 commented 1 year ago

Hi! I'm experiencing the same problem. Any update on this issue? :)

reneros commented 1 year ago

I have removed the extra aggregation for EntryBar and ExitBar. That appears to solve to problem, but you loose the plot of the Entry/Exit points

 if len(trades):  # Avoid pandas "resampling on Int64 index" error 
     trades = trades.assign(count=1).resample(freq, on='ExitTime', label='right').agg(dict( 
         TRADES_AGG, 
         ReturnPct=_weighted_returns, 
         count='sum', 
         #EntryBar=_group_trades('EntryTime'), 
         #ExitBar=_group_trades('ExitTime'), 
     )).dropna() 
AlejandroRigau commented 1 year ago

Bump. Having the same issue.

UserWarning:

Data contains too many candlesticks to plot; downsampling to '8H'. See `Backtest.plot(resample=...)`
ValueError: Length of values (2) does not match length of index (1)
AlejandroRigau commented 1 year ago

Interestingly enough, I tried running this in WSL and it worked fine with Bokeh 3.1.1 and backtesting.py 0.3.3. Im using more than 50K rows.

PilotGFX commented 1 year ago

Wrestling a whole lot with this one! Downgrading bokeh, checking length of DF in all possible ways, setting resample to 2H, but the advice from casper with setting it to False fixed it. But it is still sad, that i have to plot hundreds of thousands of 5 min candles, that i cannot even see on the screen, in regards to speed. will there be a fix, or is it something we could fix ourself?

yash2mehta commented 1 year ago

Same issue, any fix?

PilotGFX commented 10 months ago

Same issue, any fix?

my solution was to plot it manually with plotly graph objects. resampling to 1H is decently enough performance wise. as i was doing that, i also made a neatly color formatted table with the stats with pandas to_html

tani3010 commented 10 months ago

In my resampling from hourly timeseries to weekly, once I changed view(int) to view('int64') like below, it worked.

 def _group_trades(column): 
     def f(s, new_index=pd.Index(df.index.view('int64')), bars=trades[column]): 
         if s.size: 
             # Via int64 because on pandas recently broken datetime 
             mean_time = int(bars.loc[s.index].view('int64').mean()) 
             new_bar_idx = new_index.get_loc(mean_time, method='nearest') 
             return new_bar_idx 
     return f 

The following is the original one. https://github.com/kernc/backtesting.py/blob/65f54f6819cac5f36fd94ebf0377644c62b4ee3d/backtesting/_plotting.py#L143-L150

From my observation, view(int) actually returned int32 instead of int64 and also L147 was crushed in some reason with pandas 2.0.1 and backtesting 0.3.3. I think this issue happens when we use more frequent data than daily as original post of this topic said.

This is what I saw in dtype.

> df.index.view(int).dtype
dtype('int32')
    alignment: 4
    base: dtype('int32')
    byteorder: '='
    char: 'l'
    descr: [('', '<i4')]
    fields: None
    flags: 0
    hasobject: False
    isalignedstruct: False
    isbuiltin: 1
    isnative: True
    itemsize: 4
    kind: 'i'
    metadata: None
    name: 'int32'
    names: None
    ndim: 0
    num: 7
    shape: ()
    str: '<i4'
    subdtype: None

> df.index.view('int64').dtype
dtype('int64')
    alignment: 8
    base: dtype('int64')
    byteorder: '='
    char: 'q'
    descr: [('', '<i8')]
    fields: None
    flags: 0
    hasobject: False
    isalignedstruct: False
    isbuiltin: 1
    isnative: True
    itemsize: 8
    kind: 'i'
    metadata: None
    name: 'int64'
    names: None
    ndim: 0
    num: 9
    shape: ()
    str: '<i8'
    subdtype: None
Timbot-42 commented 8 months ago

Same issue, any fix?

my solution was to plot it manually with plotly graph objects. resampling to 1H is decently enough performance wise. as i was doing that, i also made a neatly color formatted table with the stats with pandas to_html

Any chance you can share your solution? Also hitting this issue with 200k historical data points.

AssetOverflow commented 7 months ago

this obviously hasn't been fixed.. but @tani3010 , is this a for sure working solution you just recently for _group_trades()?

MartinNiederl commented 5 months ago

this obviously hasn't been fixed.. but @tani3010 , is this a for sure working solution you just recently for _group_trades()?

I had to change an additional line because get_loc does not have a method parameter anymore:

    def _group_trades(column):
        def f(s, new_index=pd.Index(df.index.view('int64')), bars=trades[column]):
            if s.size:
                # Via int64 because on pandas recently broken datetime
                mean_time = int(bars.loc[s.index].view('int64').mean())
                new_bar_idx = new_index.get_indexer([mean_time], method='nearest')[0]
                return new_bar_idx
        return f

This solution currently works as expected for me.

PilotGFX commented 5 months ago

Same issue, any fix?

my solution was to plot it manually with plotly graph objects. resampling to 1H is decently enough performance wise. as i was doing that, i also made a neatly color formatted table with the stats with pandas to_html

Any chance you can share your solution? Also hitting this issue with 200k historical data points.

Hi there! Sure! I've stopped using Backtesting because it is too slow, but i've digged down in the chest to find some hopefully usable code for you. You will need to take out the _trades that are inside the backtest results(the series object with the stats) :

import plotly.graph_objects as go
from plotly.subplots import make_subplots

ohlcdata = df.resample('1H').agg({'Open':'first','High':'max','Low':'min','Close':'last','Volume':'sum'})
charts = make_subplots(rows=1, cols=1)

charts.add_trace(go.Candlestick(showlegend=False,name='OHLC',x=ohlcdata.index,open=ohlcdata['Open'], high=ohlcdata['High'], low=ohlcdata['Low'],close=ohlcdata['Close'], row=1, col=1)
hover_entry = [f" <br>{entrytime}<br>Qty: {size}<br>Price: {round(price,4)}" for entrytime, size, price in zip(optistats_trades['EntryTime'], optistats_trades['Size'], optistats_trades['EntryPrice']*0.99)]
charts.add_trace(go.Scatter(hovertemplate=hover_entry,showlegend=False,x=optistats_trades['EntryTime'],y=optistats_trades['EntryPrice'], name=' ', mode='markers', marker=dict(size=10, symbol="arrow",color=light,showscale=False)), row=1, col=1)
hover_exit = [f" <br>{time}<br>PnL: {round(pnl, 2)}<br>Return %: {round(return_pct*100,2)}<br>Price: {round(price,2)}" for pnl, return_pct, time, price in zip(optistats_trades['PnL'], optistats_trades['ReturnPct'], optistats_trades['ExitTime'], optistats_trades['ExitPrice'])]
charts.add_trace(go.Scatter(hovertemplate=hover_exit,showlegend=False,x=optistats_trades['ExitTime'],y=optistats_trades['ExitPrice'],name=" ",mode='markers', marker=dict(size=15, symbol="triangle-down",), row=1, col=1)
start_date = '2017-06-01'
end_date = '2023-06-01'
charts.update_xaxes(matches='x1',griddash='dot',range=[start_date, end_date],showdividers=True,showline=False)
charts_equity_html = charts.to_html(div_id='charts')

with open(filename, 'w', encoding='utf-8') as the_file:
    the_file.write(charts_equity_html)

something like this, i retrieved it from a messy file and tried to clean it a bit, but it should be a good head start for you

to create a html table you can use: results_html = pandas.DataFrame(metrics).to_html()

ZhuYizhou2333 commented 3 months ago

In my resampling from hourly timeseries to weekly, once I changed view(int) to view('int64') like below, it worked.在我从每小时时间序列到每周的重新采样中,一旦我更改 view(int)view('int64') 下面的样,它就起作用了。

 def _group_trades(column): 
     def f(s, new_index=pd.Index(df.index.view('int64')), bars=trades[column]): 
         if s.size: 
             # Via int64 because on pandas recently broken datetime 
             mean_time = int(bars.loc[s.index].view('int64').mean()) 
             new_bar_idx = new_index.get_loc(mean_time, method='nearest') 
             return new_bar_idx 
     return f 

The following is the original one.以下是原始的。

https://github.com/kernc/backtesting.py/blob/65f54f6819cac5f36fd94ebf0377644c62b4ee3d/backtesting/_plotting.py#L143-L150

From my observation, view(int) actually returned int32 instead of int64 and also L147 was crushed in some reason with pandas 2.0.1 and backtesting 0.3.3. I think this issue happens when we use more frequent data than daily as original post of this topic said.根据我的观察, view(int) 实际上返回了 int32 而不是 int64,而且 L147 也因某种原因被 pandas 2.0.1 和回测 0.3.3 压垮了。我认为当我们使用比每天更频繁的数据时,就会发生这个问题,正如本主题的原始帖子所说。

This is what I saw in dtype.这就是我在 dtype .

> df.index.view(int).dtype
dtype('int32')
    alignment: 4
    base: dtype('int32')
    byteorder: '='
    char: 'l'
    descr: [('', '<i4')]
    fields: None
    flags: 0
    hasobject: False
    isalignedstruct: False
    isbuiltin: 1
    isnative: True
    itemsize: 4
    kind: 'i'
    metadata: None
    name: 'int32'
    names: None
    ndim: 0
    num: 7
    shape: ()
    str: '<i4'
    subdtype: None

> df.index.view('int64').dtype
dtype('int64')
    alignment: 8
    base: dtype('int64')
    byteorder: '='
    char: 'q'
    descr: [('', '<i8')]
    fields: None
    flags: 0
    hasobject: False
    isalignedstruct: False
    isbuiltin: 1
    isnative: True
    itemsize: 8
    kind: 'i'
    metadata: None
    name: 'int64'
    names: None
    ndim: 0
    num: 9
    shape: ()
    str: '<i8'
    subdtype: None

It almost fixed the problem. Due the codes are too old to run smoothly (especially with bokeh and pandas), we still need to replace some other functions. It's easy to fix with copilot.

Dooooosh commented 3 months ago

1m, 5m: 10,000 lines of data frame was limited 15m: same as 1m and 5m, but must use the higher timeframe file to run. 1h, 4h, 1D: No problem...

The extra file I create is just fetching the historical data with csv line limiter.

Fetch and print historical data

max_lines = 10000 print(get_historical_data(symbol, timeframe, max_lines))

PilotGFX commented 3 months ago

1m, 5m: 10,000 lines of data frame was limited 15m: same as 1m and 5m, but must use the higher timeframe file to run. 1h, 4h, 1D: No problem...

The extra file I create is just fetching the historical data with csv line limiter.

Fetch and print historical data

max_lines = 10000 print(get_historical_data(symbol, timeframe, max_lines))

I again suggest resampling the data before plotting it. It is simple to just use .resample() on the df, instead of constraining yourself to such a small test period. Sure, it is alot of years on daily, but at a point you might want to test on lower resolution to get a more realistic result. For me, 10_000 candles is 6 point 944 days.

data.resample("4h").agg({'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last', 'Volume': 'sum'})

The limit of 10,000 is sensible as the js containing the bokeh plot becomes very large, evidenced by the size of the html file.