Support dataframe groupby so you can have multiple timeseries in the same dataframe

twopirllc / pandas-ta

Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 150+ Indicators

https://twopirllc.github.io/pandas-ta/

MIT License

5.43k stars 1.06k forks source link

Support dataframe groupby so you can have multiple timeseries in the same dataframe #229

Open johanbjerke opened 3 years ago

johanbjerke commented 3 years ago

Which version are you running? The lastest version is Github. Pip is for major releases. Latest version

Describe the solution you'd like By allowing this you can calculate indicators for many Symbols in one hit which gives better performance and less code.

Describe alternatives you've considered One way would be to reference the dataframe like this:

df.groupby('Symbol').ta.rsi(length=2, append=True)

twopirllc commented 3 years ago

Hello @johanbjerke,

That would be a pretty cool feature. I have also not seen it used by any other Python TA Open Source Library. I have no idea when and if this can/will be implemented with so many outstanding Issues and previous Enhancements in the pipeline. The only way it can be escalated if someone else dives into it and makes a PR. Would you like to try?

Kind Regards, KJ

foooooooooooooooobar commented 3 years ago

FWIW something like this works for me. I can try to integrate it into core if it's something helpful for a lot of people. Thoughts on using joblib @twopirllc? It has the advantage of allowing different libraries to be used (loky, multiprocessing, threading, etc).

import pandasvault as pv
from joblib import parallel_backend, delayed, Parallel

def add_features(df):
    import pandas_ta

    df = df.copy()
    df.ta.cores = 0 # better to use loky
    df.ta.strategy(exclude=["ichimoku", "dpo"])

    return df

class ProgressParallel(joblib.Parallel):
    def __call__(self, *args, **kwargs):
        with tqdm() as self._pbar:
            return joblib.Parallel.__call__(self, *args, **kwargs)

    def print_progress(self):
        self._pbar.total = self.n_dispatched_tasks
        self._pbar.n = self.n_completed_tasks
        self._pbar.refresh()

all_df = pd.concat(ProgressParallel(n_jobs=-1)(delayed(add_features)(all_df.loc[symbol]) for symbol in all_df.symbol.unique()[0:]))

twopirllc commented 3 years ago

Hello @foooooooooooooooobar,

I haven't seen joblib, but it looks like it has potential.

I can try to integrate it into core if it's something helpful for a lot of people.

I do not see why not. But I would like to see a non-joblib implementation as well in case they do not have joblib installed.

I have decided to extend Pandas TA to utilize other libraries only if they have the library installed. See the Imports dictionary in pandas_ta/_init.py and an implementation using yfinance in pandas_ta/utils/_data.py.

So you are welcome to give it a shot. Contributions are welcome 😎

Kind Regards, KJ