Open hundan2020 opened 2 years ago
You can install backtesting=0.3.2, and it will plot after some warning. It has something to do with the way backtesting 0.3.3 resamples when there are to many data points.
Experiencing the same issue with 0.33. It seems that this bug is located somewhere in the resampling functions, as it is only triggered only when the resample=True flag is taking effect (i.m., # entries > 10000 by default). When forcing resampling with a string, it will always be triggered no matter the number of entries.
Experiencing the same issue with 0.32 and 0.33
I thought I was the only one having this problem. Just set resample=False and it is fixed, but then you cannot use resampling for your plots.
Any update on this issue? Resampling doesn't seem to be working on the current build, and I wasn't able to diagnose the issue. I'm not even sure what the root cause is... 🤷♂️
As my debug, _group_trades
inside _maybe_resample_data
didn't work correctly because error happened below aggregation.
By the way, why do we need another aggregation for EntryBar/ExitBar? In my impression TRADES_AGG
already has it and we can simply use it. so can we remove these two lines? or am I missing something?
My version was 0.3.3
TRADES_AGG = OrderedDict((
('Size', 'sum'),
('EntryBar', 'first'),
('ExitBar', 'last'),
('EntryPrice', 'mean'),
('ExitPrice', 'mean'),
('PnL', 'sum'),
('ReturnPct', 'mean'),
('EntryTime', 'first'),
('ExitTime', 'last'),
('Duration', 'sum'),
))
Hi! I'm experiencing the same problem. Any update on this issue? :)
I have removed the extra aggregation for EntryBar and ExitBar. That appears to solve to problem, but you loose the plot of the Entry/Exit points
if len(trades): # Avoid pandas "resampling on Int64 index" error
trades = trades.assign(count=1).resample(freq, on='ExitTime', label='right').agg(dict(
TRADES_AGG,
ReturnPct=_weighted_returns,
count='sum',
#EntryBar=_group_trades('EntryTime'),
#ExitBar=_group_trades('ExitTime'),
)).dropna()
Bump. Having the same issue.
UserWarning:
Data contains too many candlesticks to plot; downsampling to '8H'. See `Backtest.plot(resample=...)`
ValueError: Length of values (2) does not match length of index (1)
Interestingly enough, I tried running this in WSL and it worked fine with Bokeh 3.1.1 and backtesting.py 0.3.3. Im using more than 50K rows.
Wrestling a whole lot with this one! Downgrading bokeh, checking length of DF in all possible ways, setting resample to 2H, but the advice from casper with setting it to False fixed it. But it is still sad, that i have to plot hundreds of thousands of 5 min candles, that i cannot even see on the screen, in regards to speed. will there be a fix, or is it something we could fix ourself?
Same issue, any fix?
Same issue, any fix?
my solution was to plot it manually with plotly graph objects. resampling to 1H is decently enough performance wise. as i was doing that, i also made a neatly color formatted table with the stats with pandas to_html
In my resampling from hourly timeseries to weekly, once I changed view(int)
to view('int64')
like below, it worked.
def _group_trades(column):
def f(s, new_index=pd.Index(df.index.view('int64')), bars=trades[column]):
if s.size:
# Via int64 because on pandas recently broken datetime
mean_time = int(bars.loc[s.index].view('int64').mean())
new_bar_idx = new_index.get_loc(mean_time, method='nearest')
return new_bar_idx
return f
The following is the original one. https://github.com/kernc/backtesting.py/blob/65f54f6819cac5f36fd94ebf0377644c62b4ee3d/backtesting/_plotting.py#L143-L150
From my observation, view(int)
actually returned int32 instead of int64 and also L147 was crushed in some reason with pandas 2.0.1 and backtesting 0.3.3. I think this issue happens when we use more frequent data than daily as original post of this topic said.
This is what I saw in dtype
.
> df.index.view(int).dtype
dtype('int32')
alignment: 4
base: dtype('int32')
byteorder: '='
char: 'l'
descr: [('', '<i4')]
fields: None
flags: 0
hasobject: False
isalignedstruct: False
isbuiltin: 1
isnative: True
itemsize: 4
kind: 'i'
metadata: None
name: 'int32'
names: None
ndim: 0
num: 7
shape: ()
str: '<i4'
subdtype: None
> df.index.view('int64').dtype
dtype('int64')
alignment: 8
base: dtype('int64')
byteorder: '='
char: 'q'
descr: [('', '<i8')]
fields: None
flags: 0
hasobject: False
isalignedstruct: False
isbuiltin: 1
isnative: True
itemsize: 8
kind: 'i'
metadata: None
name: 'int64'
names: None
ndim: 0
num: 9
shape: ()
str: '<i8'
subdtype: None
Same issue, any fix?
my solution was to plot it manually with plotly graph objects. resampling to 1H is decently enough performance wise. as i was doing that, i also made a neatly color formatted table with the stats with pandas to_html
Any chance you can share your solution? Also hitting this issue with 200k historical data points.
this obviously hasn't been fixed.. but @tani3010 , is this a for sure working solution you just recently for _group_trades()?
this obviously hasn't been fixed.. but @tani3010 , is this a for sure working solution you just recently for _group_trades()?
I had to change an additional line because get_loc
does not have a method
parameter anymore:
def _group_trades(column):
def f(s, new_index=pd.Index(df.index.view('int64')), bars=trades[column]):
if s.size:
# Via int64 because on pandas recently broken datetime
mean_time = int(bars.loc[s.index].view('int64').mean())
new_bar_idx = new_index.get_indexer([mean_time], method='nearest')[0]
return new_bar_idx
return f
This solution currently works as expected for me.
Same issue, any fix?
my solution was to plot it manually with plotly graph objects. resampling to 1H is decently enough performance wise. as i was doing that, i also made a neatly color formatted table with the stats with pandas to_html
Any chance you can share your solution? Also hitting this issue with 200k historical data points.
Hi there! Sure! I've stopped using Backtesting because it is too slow, but i've digged down in the chest to find some hopefully usable code for you. You will need to take out the _trades that are inside the backtest results(the series object with the stats) :
import plotly.graph_objects as go
from plotly.subplots import make_subplots
ohlcdata = df.resample('1H').agg({'Open':'first','High':'max','Low':'min','Close':'last','Volume':'sum'})
charts = make_subplots(rows=1, cols=1)
charts.add_trace(go.Candlestick(showlegend=False,name='OHLC',x=ohlcdata.index,open=ohlcdata['Open'], high=ohlcdata['High'], low=ohlcdata['Low'],close=ohlcdata['Close'], row=1, col=1)
hover_entry = [f" <br>{entrytime}<br>Qty: {size}<br>Price: {round(price,4)}" for entrytime, size, price in zip(optistats_trades['EntryTime'], optistats_trades['Size'], optistats_trades['EntryPrice']*0.99)]
charts.add_trace(go.Scatter(hovertemplate=hover_entry,showlegend=False,x=optistats_trades['EntryTime'],y=optistats_trades['EntryPrice'], name=' ', mode='markers', marker=dict(size=10, symbol="arrow",color=light,showscale=False)), row=1, col=1)
hover_exit = [f" <br>{time}<br>PnL: {round(pnl, 2)}<br>Return %: {round(return_pct*100,2)}<br>Price: {round(price,2)}" for pnl, return_pct, time, price in zip(optistats_trades['PnL'], optistats_trades['ReturnPct'], optistats_trades['ExitTime'], optistats_trades['ExitPrice'])]
charts.add_trace(go.Scatter(hovertemplate=hover_exit,showlegend=False,x=optistats_trades['ExitTime'],y=optistats_trades['ExitPrice'],name=" ",mode='markers', marker=dict(size=15, symbol="triangle-down",), row=1, col=1)
start_date = '2017-06-01'
end_date = '2023-06-01'
charts.update_xaxes(matches='x1',griddash='dot',range=[start_date, end_date],showdividers=True,showline=False)
charts_equity_html = charts.to_html(div_id='charts')
with open(filename, 'w', encoding='utf-8') as the_file:
the_file.write(charts_equity_html)
something like this, i retrieved it from a messy file and tried to clean it a bit, but it should be a good head start for you
to create a html table you can use: results_html = pandas.DataFrame(metrics).to_html()
In my resampling from hourly timeseries to weekly, once I changed
view(int)
toview('int64')
like below, it worked.在我从每小时时间序列到每周的重新采样中,一旦我更改view(int)
为view('int64')
下面的样,它就起作用了。def _group_trades(column): def f(s, new_index=pd.Index(df.index.view('int64')), bars=trades[column]): if s.size: # Via int64 because on pandas recently broken datetime mean_time = int(bars.loc[s.index].view('int64').mean()) new_bar_idx = new_index.get_loc(mean_time, method='nearest') return new_bar_idx return f
The following is the original one.以下是原始的。
From my observation,
view(int)
actually returned int32 instead of int64 and also L147 was crushed in some reason with pandas 2.0.1 and backtesting 0.3.3. I think this issue happens when we use more frequent data than daily as original post of this topic said.根据我的观察,view(int)
实际上返回了 int32 而不是 int64,而且 L147 也因某种原因被 pandas 2.0.1 和回测 0.3.3 压垮了。我认为当我们使用比每天更频繁的数据时,就会发生这个问题,正如本主题的原始帖子所说。This is what I saw in
dtype
.这就是我在dtype
.> df.index.view(int).dtype dtype('int32') alignment: 4 base: dtype('int32') byteorder: '=' char: 'l' descr: [('', '<i4')] fields: None flags: 0 hasobject: False isalignedstruct: False isbuiltin: 1 isnative: True itemsize: 4 kind: 'i' metadata: None name: 'int32' names: None ndim: 0 num: 7 shape: () str: '<i4' subdtype: None > df.index.view('int64').dtype dtype('int64') alignment: 8 base: dtype('int64') byteorder: '=' char: 'q' descr: [('', '<i8')] fields: None flags: 0 hasobject: False isalignedstruct: False isbuiltin: 1 isnative: True itemsize: 8 kind: 'i' metadata: None name: 'int64' names: None ndim: 0 num: 9 shape: () str: '<i8' subdtype: None
It almost fixed the problem. Due the codes are too old to run smoothly (especially with bokeh and pandas), we still need to replace some other functions. It's easy to fix with copilot.
1m, 5m: 10,000 lines of data frame was limited 15m: same as 1m and 5m, but must use the higher timeframe file to run. 1h, 4h, 1D: No problem...
The extra file I create is just fetching the historical data with csv line limiter.
max_lines = 10000 print(get_historical_data(symbol, timeframe, max_lines))
1m, 5m: 10,000 lines of data frame was limited 15m: same as 1m and 5m, but must use the higher timeframe file to run. 1h, 4h, 1D: No problem...
The extra file I create is just fetching the historical data with csv line limiter.
Fetch and print historical data
max_lines = 10000 print(get_historical_data(symbol, timeframe, max_lines))
I again suggest resampling the data before plotting it. It is simple to just use .resample() on the df, instead of constraining yourself to such a small test period. Sure, it is alot of years on daily, but at a point you might want to test on lower resolution to get a more realistic result. For me, 10_000 candles is 6 point 944 days.
data.resample("4h").agg({'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last', 'Volume': 'sum'})
The limit of 10,000 is sensible as the js containing the bokeh plot becomes very large, evidenced by the size of the html file.
Expected Behavior
expect draw a plot html
(resample param is True by default, same code is works fine when using 1day data, seems like it is because there is too many data?)
Actual Behavior
Steps to Reproduce
main.py
ETHUSDT-15m.csv ETHUSDT-1d.csv
Additional info