Is Vectorbt suitable for price-based aggregations?

gcaplan commented 3 years ago

Congratulations on an exciting project - a seriously impressive achievement in such a short timeframe!

I'd be grateful if you could help me assess its suitability for our trading style. I'm a developer, but have never used Python. But I'd learn it to access Vectorbt provided you can reassure me that I'm not going to hit any show-stopping issues with price-based aggregations.

Our trading style uses aggregations such as Renko, which as you know can form at any time. So bars in a portfolio test are not aligned by time as with conventional time-based candles.

This means that a portfolio dataframe would normally have just a single value for each datetime-indexed row, with all the other instruments set to NaN. I'd be importing the data from CSV files in OHLCV format.

Will this pose problems for analysis or visualisations? So many backtester projects seem to assume that data is aligned by time that it's proving challenging to find any serious tool for backtesting Renko.

Thanks in advance for your advice!

polakowo commented 3 years ago

Hi @gcaplan, I haven't played with Renko charts, but vectorbt doesn't require data to be aligned by time in any way, nor it requires data to be indexed by time at all, you can just provide a list of values with no index and vectorbt would process that. The main requirement is that you can represent any feature of your data as an array (of the same length).

Since your data would come at irregular intervals, you wouldn't be able to produce some of the statistics that require annualization, but you can still manually provide the frequency.

Many NaN values should be of no concern, but you may get issues calculating a consistent portfolio value, although there is a mechanism with forward and backward NaN filling that tries to mitigate that:

import vectorbt as vbt
import pandas as pd
import numpy as np

price = pd.DataFrame()
price['asset1'] = [1, np.nan, 3, np.nan]
price['asset2'] = [np.nan, 2, np.nan, 4]
portfolio = vbt.Portfolio.from_orders(price, 1.)
portfolio.value()  # no nans
   asset1  asset2
0   100.0   100.0
1   100.0   100.0
2   102.0   100.0
3   102.0   102.0

If you're about to build a multi-asset portfolio, you may find difficult rebalancing using conventional functions in vectorbt, because each rebalancing operation is done within a single tick and it requires the pricing for each asset within this tick to be known, but you can utilize low-level API to write your custom rebalancing logic.

Generally, although vectorbt simplifies backtesting in many ways, this tool is safe to use only if you have at least an intermediate knowledge of pandas and NumPy (and Numba, but it's similar to NumPy). One can cause a lot of damage by mishandling arrays: a single misplaced element can lead to devastating effects on your strategy, so it's vital for you to know how multi-dimensional arrays work and how vectorbt processes them. Because vectors of trading data are harder to debug than iterative code, you should also bring some curiosity and knowledge of data analysis. This will allow you to introspect your trading strategy from different angles and build a bullet-proof strategy.

gcaplan commented 3 years ago

Thanks for your very generous and helpful response!

Your warning about the dangers of a newbie getting burned by vector math is well taken. I've been programming since the days of punch cards (literally) but I'm new to Python and data science - my background is in line-of-business work. So I realise I have a steep learning curve ahead before I can use vectorbt in production. But it will certainly motivate me to learn pandas and NumPy properly, and they are my main motivation for learning Python in the first place. I enjoy a challenge, and your generosity in sharing this exciting tool has finally motivated me to learn this interesting language...

polakowo commented 3 years ago

@gcaplan being a veteran in programming will certainly help you master Python, and if you additionally bring any valuable knowledge in trading, you can be of great help in further refining vectorbt. I'm always glad to see people with different backgrounds and ideas. As said, I never backtested aggregated data apart from plain OHLC, so it might be an interesting use case and you can hit me up to discuss this or any other question.

gcaplan commented 3 years ago

Oleg - thanks for the offer! I'm somewhat in awe of your development abilities, so I might just take you up on that at some point.

Automation has been a frustration.

We have been trading some years now with a manual fx system that is consistently profitable, but as it's an active system there is a cognitive limit to the number of pairs and hours we can trade. Plus it may well work in other markets but manual testing to the level we require is very time-consuming, as we are rather risk-averse.

I do have a little background in stats but we're very much price-action traders rather than quants. Price aggregations have properties and patterns for price-action you don't see in regular candles. Renko in particular lends itself to automation because its regular size leads to unambiguous patterns. While Range bars have some interesting properties in fast moving trends, where outliers have a strong propensity to return to the mean. You might enjoy having a play with them - none of our strategies would work with time-based candles, and Renko/Range open possibilities for unconventional price-action ideas that actually work!

But pretty much all the consumer backtesting platforms have an explicit or implicit reliance on time-based candles so I've hit a few roadblocks. When they can handle price aggregations, as with Zorro or JForex, it's usually only at the tick level which is hopelessly unrealistic when exploring a large universe of ideas without access to a supercomputer!

Had a go at writing my own backtester/trading platform against a broker API but was too ambitious, and was using an enterprise language. I think the key is to be very narrowly focused and keep things as simple as possible, using a loosely typed language like Python with good libraries. I feel I'm on the right road now. With vectorbt for general exploration and a very simple event-driven backtester for validating ideas in more detail I'm hoping we can get automated at last!

polakowo commented 3 years ago

@gcaplan I appreciate your insights on price aggregations - will definitely have a play with them.

You picked the right approach - vectorbt is best used for exploration, hyperparameter optimization, and dashboards. There is also room for more weird use cases, such as "continuously scan for market caps that grow disproportionally faster and send me as a report via Telegram". For me personally, vectorbt became a real Swiss knife for keeping me alive and profitable in crypto. Being an automation freak, I then try to generalize and integrate every crazy idea into this framework, which has worked out pretty well.

But it still has so much room to grow in terms of actual backtesting capabilities, and having another backtester by side for validation is a good decision that will lead to more trust. Actually, I'm working right now on improving the event-driven backtesting mode in vectorbt, and hopefully, it becomes convenient enough for the user to completely switch from vectorized backtesting without sacrificing any performance (...it's actually faster because you have to traverse through data only once).

Anyway, I welcome your ambitions and can't wait to hear about your experience!

polakowo / vectorbt

Is Vectorbt suitable for price-based aggregations? #155