kernc / backtesting.py

:mag_right: :chart_with_upwards_trend: :snake: :moneybag: Backtest trading strategies in Python.
https://kernc.github.io/backtesting.py/
GNU Affero General Public License v3.0
5.51k stars 1.07k forks source link

Implementing a new Custom Indicator: AttributeError: 'NoneType' object has no attribute 'size' and #342

Closed windowshopr closed 3 years ago

windowshopr commented 3 years ago

Expected Behavior

I'm looking to just add on to the indicators library, trying to copy the format used for the SMA. Currently, I'm trying to create the SuperTrend indicator, which can be generated using the Pandas TA library, however I'm running into some errors which I can't quite figure out what's happening. It has to do with the next() function of backtesting somehow (I think).

I've attached a full working code (with comments) for anyone to recreate my issue. Here is a list of dependencies needed to run it:

yfinance==0.1.59
pandas==1.2.4
pandas-ta==0.2.45b0
Backtesting==0.3.1

Running this on Python 3.7.9 and Windows 10.

As you'll see, I define a custom Indicators_Class in the middle with some functions to return both the default SMA as a boilerplate, and the SUPERTREND, which I'm trying to get working. The top section just downloads AAPL's dataset from Yahoo Finance and saves it to a .csv file, which can be glanced over as the issue lies with either the Indicators_Class or the backtesting section near the bottom.

Steps to Reproduce

So, using this full code to recreate the issue (sorry about the length, but it is a full working example that downloads a dataset, etc.):

from backtesting import Backtest, Strategy
import pandas as pd
import pandas_ta as ta
import os.path
from os import path
import yfinance as yf

####################################################################
# Download datasets from Yahoo Finance
####################################################################
# Make a historical folder (in root directory) if it's not created already
if not os.path.exists("./historical/intraday"):
    os.makedirs("./historical/intraday")

tickers_to_download = ['AAPL']

print("\nDownloading new datasets for:")
print(tickers_to_download)

# Download
data = yf.download(  # or pdr.get_data_yahoo(...
        # tickers list or string as well
        tickers = tickers_to_download,

        # use "period" instead of start/end
        # valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
        # (optional, default is '1mo')
        period = "7d",

        # fetch data by interval (including intraday if period < 60 days)
        # valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
        # (optional, default is '1d')
        interval = "5m",

        # group by ticker (to access via data['SPY'])
        # (optional, default is 'column')
        group_by = 'ticker',

        # adjust all OHLC automatically
        # (optional, default is False)
        auto_adjust = True,

        # download pre/post regular market hours data
        # (optional, default is False)
        prepost = False,

        # use threads for mass downloading? (True/False/Integer)
        # (optional, default is True)
        threads = True,

        # proxy URL scheme use use when downloading?
        # (optional, default is None)
        proxy = None
    )

print(data)

# Save the datasets to their own .csv files
if len(tickers_to_download) > 1:
    for curr_ticker in tickers_to_download:
        data[curr_ticker].to_csv("./historical/intraday/" + curr_ticker + ".csv", index=True)
elif len(tickers_to_download) == 1:
    data.to_csv("./historical/intraday/" + tickers_to_download[0] + ".csv", index=True)
print("Done downloading datasets.")

####################################################################
# My custom indicator class
####################################################################
class Indicators_Class():

    def __init__(self, *args, **kwargs):
        super(Indicators_Class, self).__init__(*args, **kwargs)

        # Global variables go here
        # self.variable = variable

    # Simple Moving Average (leaving here as an example)
    def SMA(arr: pd.Series, n: int) -> pd.Series:
        """
        Returns `n`-period simple moving average of array `arr`.
        """
        return pd.Series(arr).rolling(n).mean()

    # SuperTrend (What I'm trying to implement)
    def SUPERTREND(df: pd.DataFrame, supertrend_period: int, supertrend_atr_multiplier: int):
        """
        Returns `n`-period SuperTrend.
        """
        # Use pandas.ta to create a SuperTrend dataframe
        supertrend_df = ta.supertrend(df['High'], df['Low'], df['Close'], length=supertrend_period, multiplier=supertrend_atr_multiplier)

        # Rename supertrend columns to something more useful
        for col in supertrend_df.columns:
            if "SUPERT_" in col:
                supertrend_df = supertrend_df.rename(columns={col: "SuperTrend"})
            # elif "SUPERTd_" in col:
            #     supertrend_df = supertrend_df.rename(columns={col: "in_uptrend"})

        # Return the SuperTrend column only. This is the same as a pd.Series, isn't it?
        return supertrend_df['SuperTrend']

####################################################################
# Import datasets and backtest
####################################################################
# Loop through all tickers, import their datasets
for curr_ticker in tickers_to_download:
    try:
        df = pd.read_csv("./historical/intraday/" + curr_ticker + ".csv")
    except:
        print("Couldn't find DF for", curr_ticker, "so skipping it...")
        continue

    # Because of how YF combines multiple stock df's into one df, and how we
    # cut out the individual stock's df's and save them, they come with NaN
    # rows in them already, so drop all NaN rows now!
    df = df.dropna()

    # Convert the individual stock's Datetime column to datetime objects
    df['Datetime'] = pd.to_datetime(df['Datetime'])

    # Set the Datetime index
    df = df.set_index('Datetime')

    # print(df)

    # Now, we have to define a strategy class to backtest with.
    class SuperTrend(Strategy):

        # Define some initiale variables here
        def init(self):
            # Moving average (as an example, not used in the backtest below)
            self.ma1 = Indicators_Class.SMA(self.data.Close, 20)
            # SuperTrend (what I'm trying to implement in the beginning)
            self.SuperTrend = Indicators_Class.SUPERTREND(self.data, 7, 3)

            # Some variables to store data as we backtest
            self.positions = 0
            self.last_sale_price = 0
            self.last_open_price = 0

        # Then define what happens, step by step, through the backtest.
        def next(self):

            # BUY LOGIC
            # If the open is already > SuperTrend, and we don't have any open positions yet, buy at the open price
            if (self.data.Open > self.SuperTrend and self.positions == 0 ):
                self.buy(limit=self.data.Open)
                self.positions = 1
                self.last_open_price = self.data.Open
            # Else, if the high crosses over the supertrend, use the supertrend as the buy price
            elif (self.data.High > self.SuperTrend and self.positions == 0 ):
                self.buy(limit=self.SuperTrend + 0.01)
                self.positions = 1
                self.last_open_price = self.data.Open

            # SELL LOGIC
            # Once the low pierces the supertrend, sell, using the supertrend as the sale price
            elif (self.data.Low < self.SuperTrend and self.positions == 1): # and we have an open position
                self.position.close(1.0, limit=self.SuperTrend - 0.01)
                self.positions = 0
                self.last_sale_price = self.SuperTrend

    # Finally, run the backtest and get our report, stock by stock!
    bt = Backtest(df, SuperTrend, cash=10_000, commission=0.0, exclusive_orders=True, trade_on_close=True) #0.002
    stats = bt.run()
    print(stats)

Actual Behavior

...I receive this traceback:

Traceback (most recent call last):
  File "test.py", line 171, in <module>
    stats = bt.run()
  File "C:\Users\...\AppData\Roaming\Python\Python37\site-packages\backtesting\backtesting.py", line 1134, in run
    strategy.init()
  File "test.py", line 140, in init
    self.SuperTrend = Indicators_Class.SUPERTREND(self.data, 7, 3)
  File "test.py", line 34, in SUPERTREND
    supertrend_df = ta.supertrend(df['High'], df['Low'], df['Close'], length=supertrend_period, multiplier=supertrend_atr_multiplier)
  File "C:\Users\...\AppData\Roaming\Python\Python37\site-packages\pandas_ta\overlap\supertrend.py", line 20, in supertrend
    m = close.size
AttributeError: 'NoneType' object has no attribute 'size'

Which I thought had something to do with how I was passing in the data to the SUPERTREND() function, so I changed line 138:

from

self.SuperTrend = Indicators_Class.SUPERTREND(self.data, 7, 3)

to

self.SuperTrend = Indicators_Class.SUPERTREND(df, 7, 3)

...after doing so, I get the following traceback:

Traceback (most recent call last):
  File "test.py", line 170, in <module>
    stats = bt.run()
  File "C:\Users\...\AppData\Roaming\Python\Python37\site-packages\backtesting\backtesting.py", line 1165, in run
    strategy.next()
  File "test.py", line 151, in next
    if (self.data.Open > self.SuperTrend and self.positions == 0 ):
  File "C:\Users\...\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py", line 1936, in __array_ufunc__
    return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
  File "C:\Users\...\AppData\Roaming\Python\Python37\site-packages\pandas\core\arraylike.py", line 250, in array_ufunc
    result = maybe_dispatch_ufunc_to_dunder_op(self, ufunc, method, *inputs, **kwargs)
  File "pandas\_libs\ops_dispatch.pyx", line 91, in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op
  File "C:\Users\...\AppData\Roaming\Python\Python37\site-packages\pandas\core\ops\common.py", line 65, in new_method
    return method(self, other)
  File "C:\Users\...\AppData\Roaming\Python\Python37\site-packages\pandas\core\arraylike.py", line 37, in __lt__
    return self._cmp_method(other, operator.lt)
  File "C:\Users\...\AppData\Roaming\Python\Python37\site-packages\pandas\core\series.py", line 4978, in _cmp_method
    res_values = ops.comparison_op(lvalues, rvalues, op)
  File "C:\Users\...\AppData\Roaming\Python\Python37\site-packages\pandas\core\ops\array_ops.py", line 224, in comparison_op
    "Lengths must match to compare", lvalues.shape, rvalues.shape
ValueError: ('Lengths must match to compare', (506,), (2,))

...which I think I'm receiving because the length of self.data at that point during the backtest is only 2, and the length of df that I passed into the SUPERTREND() function is 506.

Additional info

So, I'm trying to get this SUPERTREND indicator working by following the boilerplate SMA template, however I'm running into these two issues. I think it has to do with converting to pd.Series() in some capacity or another, but I'm stuck, would like to figure it out so I can keep adding to that indicator library for plotting purposes. Thanks!

kernc commented 3 years ago

You need (first error) the .df accessor:

self.SuperTrend = Indicators_Class.SUPERTREND(self.data.df, 7, 3)

Additionally (second error), you need to wrap the indicator computation in Strategy.I(), i.e.:

self.SuperTrend = self.I(Indicators_Class.SUPERTREND, self.data.df, 7, 3)
windowshopr commented 3 years ago

Thank you! I knew it was going to be something easy, I just wasn't seeing it. I used the second recommendation you gave and it worked just fine. Thanks a lot!

windowshopr commented 3 years ago

Although, now that I say that, I should clarify another error I'm getting. I'm creating the SUPERTREND function in a way that the backtesting library will then plot the line on the chart for me, which it is not doing. So if you add bt.plot() to the very last line, and make sure that self.SuperTrend = self.I(Indicators_Class.SUPERTREND, self.data.df, 7, 3) is used, I receive this error at the plotting stage:

Traceback (most recent call last):
  File "test.py", line 171, in <module>
    bt.plot()
  File "C:\Users\...\AppData\Roaming\Python\Python37\site-packages\backtesting\backtesting.py", line 1721, in plot
    open_browser=open_browser)
  File "C:\Users\chalu\AppData\Roaming\Python\Python37\site-packages\backtesting\_plotting.py", line 591, in plot
    _plot_superimposed_ohlc()
  File "C:\Users\...\AppData\Roaming\Python\Python37\site-packages\backtesting\_plotting.py", line 445, in _plot_superimposed_ohlc
    raise ValueError('Invalid value for `superimpose`: Upsampling not supported.')
ValueError: Invalid value for `superimpose`: Upsampling not supported.

Any idea?

kernc commented 3 years ago

Yeah, this is the project issue tracker. It's polite to everyone subscribed to first use the search. See https://github.com/kernc/backtesting.py/issues/233.

windowshopr commented 3 years ago

My apologies, thank you for pointing me in the right direction, I think I see the timestamp issue and will fix on my end. Great support! Thanks! :)

UPDATE

That was exactly the issue, I am downloading datasets from Yahoo Finance and the format that the timestamps come in doesn't jive with backtesting, so to finish this off, to fix the plotting issue, I just did this right after importing the .csv file:

    # Convert the individual stock's Datetime column to datetime objects
    df['Datetime'] = pd.to_datetime(df['Datetime'])
    df['Datetime'] = df['Datetime'].dt.strftime("%d-%m-%y %H:%M")
    df['Datetime'] = pd.to_datetime(df['Datetime'])

This fixed the format and plotted nicely. Thanks again.