twopirllc / pandas-ta

Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 150+ Indicators
https://twopirllc.github.io/pandas-ta/
MIT License
5.5k stars 1.08k forks source link

Does it recalculate efficiently on frequent DF appends? #288

Closed timabilov closed 2 years ago

timabilov commented 3 years ago

Many indicator can be efficiently recalculated without calculating all previous ohlcv records. it hits the performance.

Usual use cases - live OHLCV stream. Also when appending - there is mismatch column problem.

I didn't found any interface for that that under the hood (talib).

Maybe someone has some idea for that? or am i missing something and it already calculates latest ohlcv indicators without recalculating old one?

twopirllc commented 3 years ago

Hello @timabilov,

Many indicator can be efficiently recalculated without calculating all previous ohlcv records.

Which? Please be specific.

I didn't found any interface for that that under the hood (talib).

Know of other sources that do with relevant code?

Usual use cases - live OHLCV stream. Also when appending - there is mismatch column problem.

Do you have examples and reproducible code?

Maybe someone has some idea for that? or am i missing something and it already calculates latest ohlcv indicators without recalculating old one?

Pandas TA calculates the chosen indicators for the ohlcv DataFrame. It is just one component of a Trading Platform. It is the responsibility of the user to manage their own data pipeline including data acquisition, cleaning, feature generation (this package), post analysis, backtesting, , signal generation, visualization, forward testing, broker integration, portfolio management, order management and execution.

Furthermore, this has been somewhat discussed before. Please see Pinned Issue #87 How to utilize this Framework for live ticks for more information and other ideas.

Kind Regards, KJ

timabilov commented 3 years ago

Hi @twopirllc ,

Sorry for making all the things hard here. Anyway i checked pinned issue and it seems there is no other way.

The point is - when we have a new tick, we append this OHLCV record to DataFrame. Short example to clarify:

def update_ohlcv(klines: pd.DataFrame, row: list):
    index = kline_start_to_index(start=row[0]) # datetime index
    row = [float(cell) if cell else None for cell in row] 
    row.extend((len(klines.columns) - len(row)) * [None]) #  HERE my bad i guess, cause new tick has no indicator  columns..
    klines.loc[index] = row
    indicatorize(klines) # klines.ta.macd and etc

    return klines.loc[index]

Each time when we append one OHLCV to our klines(DataFrame) indicators should be recalculated for all DataFrame. For example, considering only MA for now, new MA value for new OHLCV record can be calculated using previous OHLCV record's MA value( klines.loc[index -1] ).

Main question is - does underlying lib or pandas-ta consider(not calculate whole df records) previous value for any recurring indicator like MA, EMA and etc? I can be wrong here, sorry. But if it is recalculated for whole records indeed, then i am trying to find more eloquent way to handle that situation.

Know of other sources that do with relevant code?

No actually, other than calculating all the things for myself which is not the right way when you handle finance, you know.

I hope that this clarifies all the things. But anyways i will try to stick with that for some time, because i believe that is not the case, and it should be recalculated sadly, as i noted above.

twopirllc commented 3 years ago

@timabilov,

Apologies for the delay in the reply. I wanted to dig around a little more and check some additional sources.

Sorry for making all the things hard here.

Comes with the territory. 😆

Main question is - does underlying lib or pandas-ta consider(not calculate whole df records) previous value for any recurring indicator like MA, EMA and etc?

In short, no it does not because you still have to manage the process of calculation and appending to the prior DataFrame; as you have coded. This is not a self aware DataFrame and has no insight when data arrives. Nor I have seen any TA library or package that does so yet.

What you want are Incremental Computations (such as mean, variance, et al....) and they are small subset of indicators. Unfortunately, I have yet to find a list of equations that support Incremental Computations. However if you could help me identify and source these particular equations, I think I can modify those existing indicators to process only two rows to speed up the computations for this case.

However, there are many indicators where this is just not possible. Some of which require rather complex logic just to determine it's value like psar or supertrend et al.

Also you might know of Jane Street (Quantitative Trading Firm), they have released some OCaml Open Source and an Incremental Library that maybe useful for you if you are somewhat familiar with OCaml or Functional Programming.

In regards to managing the process of a Trading Platform, I also recommend reading TradingView's Execution model of Pine scripts to understand how their approach. However other well known platforms (IB, ToS, TradeStation, et al) are more black box in their management.

I can be wrong here, sorry. But if it is recalculated for whole records indeed, then i am trying to find more eloquent way to handle that situation.

We all want fast calculations. You are preaching to the choir here. Not only is this library has few dependencies, is comprehensive and accurate, but it's not that slow. Here are some run time stats on processing on an Apple Macbook Pro M1:

Screen Shot 2021-05-19 at 7 47 50 PM

Of course, your use case will vary and not everyone will be running ALL the indicators to generate features and determining trade signals.

Out of curiosity, what are you developing and for what exchanges and assets?

Hope this helps.

Kind Regards, KJ

timabilov commented 3 years ago

Thank you for really detailed answer! Yes, we can call that as incremental computation for some specific recurrent indicators.

My tests are on mac M1 is also pretty well, but that is no go, considering low cost production VPS or whatever.

It is just and auto trader which should be easy-scaled as SaaS, and that is very scary, although i know that there are tons of workarounds like streaming all indicators from central server, but anyways. Just wanted to check.

It is still ok for me of course. That's lib is really awesome! Thank you All of the problems are still nothing compared to the "working" autotrader, you know

twopirllc commented 3 years ago

Hello @timabilov,

Thank you for really detailed answer!

👍🏼 I'm still learning new things also. 😓

My tests are on mac M1 is also pretty well, but that is no go, considering low cost production VPS or whatever.

Understandable.

It is just and auto trader which should be easy-scaled as SaaS, and that is very scary, although i know that there are tons of workarounds like streaming all indicators from central server, but anyways. Just wanted to check.

Yeah. Welcome to the club! 😎 There are already existing open and closed source platforms and SaaS implementations in the wild. Trying to build my own in my spare time also.

You may be interested in this Algo Trading Summit happening on July 15th; check the link there may still be free tickets left.

It is still ok for me of course. That's lib is really awesome! Thank you

I hear ya. In between Issues, I have been trying to increase it's speed, reduce the ETL process and make it easy to Backtest with vectorbt.

Screen Shot 2021-05-23 at 1 26 34 PM


For a more detailed version, check out the development branch Example Notebook

All of the problems are still nothing compared to the "working" autotrader, you know

👍🏼

KJ

mraspaud commented 2 years ago

I know this is closed, I just wanted to mention that I stumbled upon https://github.com/nardew/talipp which is incrementally updating indicators. Maybe that could help with this problem?