twopirllc / pandas-ta

Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 150+ Indicators
https://twopirllc.github.io/pandas-ta/
MIT License
4.98k stars 976 forks source link

Multiproccessing on Custom Strategy? #104

Closed jmrichardson closed 3 years ago

jmrichardson commented 3 years ago

Hi,

It appears that multi processing is enabled when using "All" strategy. Is it possible to use it for custom strategies or any plans for support? I am dynamically creating a large custom strategy with multiple lengths, fast, slow parameters etc for each indicator. I would like to enable ta.mp but don't know how. No columns are added with ta.mp enabled currently for any strategy other than "All".

Thanks in advance for any help

twopirllc commented 3 years ago

Hello @jmrichardson,

Thanks for using Pandas TA!

Currently it is bugged, but a revamped strategy method is waiting on the development branch. Feel free to check it out.

I too am building a semi-large Custom Strategy like yours. One of the goals is to get multiprocessing (mp) enabled for Custom Strategies. But at this time, I am switching tasks to complete some prior lingering Issues. Do you have a code sample of a typical Strategy you would use? Do you have a set of Strategies that you dynamically apply as needed? I am curious how others are applying this library beyond just numerical feature building.

In short, this is similar to the following Issues #98 and #100. To keep you in the loop. Yesterday I posted to #98:

So the ta.strategy() method has been revamped and is on the development branch. There is a new ta.cores property to easily set the multiprocessing cores before ta.strategy() runs. Multiprocessing only applies to the "All" Strategy or Categorical Strategies ("Momentum", "Candles", et al) and not Custom Strategies.

If you want to trying getting multiprocessing working with a Custom Strategy, modify this if block. Make sure the new multiprocessor runs orderly as some indicators can be dependent on prior indicators if one is Custom Chaining indicators.

Also check out some of the stuff in the Examples directory. Will be pushing it to master soon though.

Coming soon! I hope!

I too am still learning about practical mp. Conceptually, it makes sense. But the tutorials and videos I have seen tailor to the simple applications. Sadly the Custom Strategy mp application is not like those. Hopefully someone can implement a solution to save us all some time.

Hope this helps!

Thanks, KJ

jmrichardson commented 3 years ago

Hi @twopirllc ,

It's my pleasure to be using Pandas TA. It really is a great package and looking forward to it's continued development.

Thank you for the quick reply and also the work on perhaps someday getting MP working. I understand the need to prioritize your work so no worries.

Unfortunately, I don't have very much experience in technical analysis, and very limited knowledge of what indicators/parameters influence equity price movement. I would like to use ML to accomplish that learning task and indeed using using Pandas TA to generate price history for downstream models. The idea is to generate indicators over different time periods (with noise) and use feature reduction to reduce dimensionality. Because this is a very lengthy process, it helps to have the jobs run in parallel as you already know.

So, in the meantime until MP is working in Pandas TA for large custom strategies, I decided to write an MP "wrapper" to accomplish the goal. I am also relatively new to MP, so there is likely a more efficient way but the following works. Here are the steps:

Here's the code if anyone is interested with the main function being "ta". Also, note that the parameters are just guesses and I really don't know if they make much sense. I am using intraday minute pricing so my parameters are larger than daily. If anyone has better suggestions, would like some advice.

Thanks, John

twopirllc commented 3 years ago

Hi @jmrichardson,

Thanks! I appreciate it.

Yeah TA is a weird confluence of time series mathematics, subjective analysis and mystical projection. Or at least that is what I thought when I first got into TA. I haven't gotten to the point of testing it out, but I do think that having many similar "looking" indicators, similar shape but different values, will not provide much additional insight. To do so, I need to find a good and compatible backtester with this library or unfortunately roll my own. It's hard to compete against builtin trading platforms backtesters though.

Awesome work on the "wrapper"! I will have to dig more deeper some time to grok it. Again like mp, I understand ML/AI conceptually but have minimal practical exposure with them. I plan to make a ML/AI Notebook in the Examples directory incorporating data collection, TA, and ML/AI post processing and backtesting. 🤞

Personally, I am not into Day Trading; only for fun sometimes. I do not want to sit at the computer watching the tape much and I would rather use my time to develop additional quantitative trading tools and components for the Open Source retail traders. I am more into Swinging and Positional Trades with scaling; trading more time with less risk. Anyways since you are on the intraday sub 5min spectrum, then Utilizing this for Live Ticks, Issue #87, may be of interest to you as well. Perhaps you have some insight or ideas to contribute to help.

Thanks, KJ

twopirllc commented 3 years ago

Hey @jmrichardson,

I forgot to mention Stochastic! Since you are generating some noise in your "wrapper", I thought you might find this package of interest if you haven't come across it already. Maybe it can be of use.

Regards, KJ

jmrichardson commented 3 years ago

Hi @twopirllc ,

Thank you for the feedback and suggestions. I agree with you that adding indicators (especially in bulk) probably won't add much value. I am currently working on a tabular model which needs historical data per observation. However, just throwing correlated data points won't help and will likely harm performance. I am hoping that the dimensionality can be reduced to features that lack relation. I will keep you posted on model improvements with your library. I am also looking at time series models that don't require reduction as well as DL. Perhaps an ensemble of the different model types may help (often the more complex the less performant).

Thanks for suggesting Stochastic package. I haven't seen it but will have a closer look soon.

It sounds like you are interested in creating your own platform at some point. If you ever want to join forces, let me know. I am working on my platform now which should be agnostic to price frequency. So swing trading should fit as well. It's just going to take some time to piece everything together including backtesting...

Thanks again! John

twopirllc commented 3 years ago

Hello @jmrichardson,

I would be interested in your approach with models.

Thanks for suggesting Stochastic package. I haven't seen it but will have a closer look soon.

No worries. Try not to reinvent the wheel if you do not need to, right?

Yeah, I am buildling my own Algo Platform. Nothing fancy as I am nowhere near where I want it to be at this time. I have built some basic feasible components but have yet to connect them together. Yeah, I have been casually looking for others to bounce ideas with and potentially create something together but we can discuss it more offline.

Thank you, Kevin

jmrichardson commented 3 years ago

Hi @twopirllc ,

Sorry for the late response. Taking this conversation offline, if you would like to chat about the algo platform, please send me a message at jmrichardson AT gmail

twopirllc commented 3 years ago

Hey @jmrichardson,

No worries.

Good news! Custom multiprocessing is nearly completed!

I am tying up some loose ends. Will contact you via pm in the coming weeks.

Thanks, KJ

twopirllc commented 3 years ago

Hey @jmrichardson

FYI, Multiprocessing Custom Strategies is up on the development branch if you want to try it out before I update the master branch.

Thanks, KJ

jmrichardson commented 3 years ago

Hi @twopirllc ,

I installed the development branch on Windows 10:

pip install -U git+https://github.com/twopirllc/pandas-ta.git@development
…
Successfully installed pandas-ta-0.2.8b0

Here is my code

        X.ta.cores = 6

        CustomStrategy = ta.Strategy(
            name="Custom Strategy",
            description="All indicators",
            ta=self.state['indicators']
        )

        X.ta.strategy(CustomStrategy, verbose=True, timed=True)

Here is my indicators as given by self.state['indicators'] above:

[{'kind': 'amat', 'fast': 67, 'slow': 214}, {'kind': 'nvi', 'length': 345}, {'kind': 'accbands', 'length': 143}, {'kind': 'ui', 'length': 344}, {'kind': 'entropy', 'length': 34}, {'kind': 'kc', 'length': 1350}, {'kind': 'tsi', 'fast': 40, 'slow': 47}, {'kind': 'dema', 'length': 1418}, {'kind': 'ichimoku', 'tenkan': 50, 'kijun': 60, 'senkou': 1000}, {'kind': 'fwma', 'length': 55}, {'kind': 'rsi', 'length': 211}, {'kind': 'coppock', 'fast': 27, 'slow': 199}, {'kind': 'kama', 'fast': 12, 'slow': 608}, {'kind': 'bias', 'length': 619}, {'kind': 'tema', 'length': 13}, {'kind': 'mom', 'length': 14}, {'kind': 'eom', 'length': 516}, {'kind': 'aberration', 'length': 5116}, {'kind': 'donchian', 'upper_length': 20, 'lower_length': 2000}, {'kind': 'stdev', 'length': 5633}, {'kind': 'mom', 'length': 88}, {'kind': 'wma', 'length': 1768}, {'kind': 'macd', 'fast': 17, 'slow': 116}, {'kind': 'cg', 'length': 44}, {'kind': 'er', 'length': 523}, {'kind': 'ppo', 'fast': 72, 'slow': 758}, {'kind': 'sma', 'length': 19}, {'kind': 'midprice', 'length': 361}, {'kind': 'increasing', 'length': 201}, {'kind': 'brar', 'length': 529}, {'kind': 'tsi', 'fast': 76, 'slow': 381}, {'kind': 'eri', 'length': 9}, {'kind': 'rvgi', 'length': 5, 'swma_length': 500}, {'kind': 'bbands', 'length': 1444}, {'kind': 'eri', 'length': 99}, {'kind': 'log_return', 'length': 479}, {'kind': 't3', 'length': 10}, {'kind': 'kurtosis', 'length': 6826}, {'kind': 'kama', 'fast': 12, 'slow': 46}, {'kind': 'rvgi', 'length': 112, 'swma_length': 500}, {'kind': 'aberration', 'length': 2842}, {'kind': 'linear_decay', 'length': 7}, {'kind': 'mad', 'length': 57}, {'kind': 'nvi', 'length': 3631}, {'kind': 'psl', 'length': 5}, {'kind': 'skew', 'length': 9}, {'kind': 't3', 'length': 5}, {'kind': 'skew', 'length': 566}, {'kind': 'pwma', 'length': 626}, {'kind': 'apo', 'fast': 12, 'slow': 90}, {'kind': 'aobv', 'fast': 7, 'slow': 647}, {'kind': 'ppo', 'fast': 19, 'slow': 148}, {'kind': 'brar', 'length': 3086}, {'kind': 'donchian', 'upper_length': 2000, 'lower_length': 500}, {'kind': 'log_return', 'length': 266}, {'kind': 'pvi', 'length': 7}, {'kind': 'rsi', 'length': 11}, {'kind': 'mad', 'length': 103}, {'kind': 'linreg', 'length': 819}, {'kind': 'rvgi', 'length': 34, 'swma_length': 250}, {'kind': 'natr', 'length': 367}, {'kind': 'donchian', 'upper_length': 1000, 'lower_length': 500}, {'kind': 'cg', 'length': 259}, {'kind': 'accbands', 'length': 79}, {'kind': 'ao', 'fast': 16, 'slow': 650}, {'kind': 'mfi', 'length': 8}, {'kind': 'aobv', 'fast': 15, 'slow': 61}, {'kind': 'log_return', 'length': 7}, {'kind': 'cdl_doji', 'length': 497}, {'kind': 'willr', 'length': 626}, {'kind': 'ao', 'fast': 58, 'slow': 80}, {'kind': 'midprice', 'length': 111}, {'kind': 'chop', 'length': 14}, {'kind': 'donchian', 'upper_length': 2000, 'lower_length': 250}, {'kind': 'willr', 'length': 3654}, {'kind': 'tema', 'length': 2770}, {'kind': 'mad', 'length': 607}, {'kind': 'pvi', 'length': 262}, {'kind': 'bbands', 'length': 76}, {'kind': 'slope', 'length': 34}, {'kind': 'stoch', 'slow_d': 20, 'slow_k': 60, 'fast_k': 500}, {'kind': 'slope', 'length': 6956}, {'kind': 'brar', 'length': 90}, {'kind': 'decreasing', 'length': 351}, {'kind': 'pdist', 'length': 32}, {'kind': 'sinwma', 'length': 651}, {'kind': 'midpoint', 'length': 101}, {'kind': 'log_return', 'length': 81}, {'kind': 'percent_return', 'length': 39}, {'kind': 'massi', 'fast': 9, 'slow': 175}, {'kind': 'vwma', 'length': 253}, {'kind': 'bbands', 'length': 2599}, {'kind': 'trix', 'length': 185}, {'kind': 'cdl_doji', 'length': 84}, {'kind': 'efi', 'length': 7}, {'kind': 'ui', 'length': 191}, {'kind': 'adx', 'length': 14}, {'kind': 'mad', 'length': 187}, {'kind': 'mfi', 'length': 897}, {'kind': 'aroon', 'length': 176}, {'kind': 'cg', 'length': 840}, {'kind': 'mad', 'length': 1968}, {'kind': 'nvi', 'length': 6538}, {'kind': 'pgo', 'length': 3348}, {'kind': 'kdj', 'length': 598}, {'kind': 'zlma', 'length': 933}, {'kind': 'midprice', 'length': 5}, {'kind': 'vortex', 'length': 41}, {'kind': 'swma', 'length': 10}, {'kind': 't3', 'length': 108}, {'kind': 'fwma', 'length': 3451}, {'kind': 'aobv', 'fast': 7, 'slow': 340}, {'kind': 'ao', 'fast': 30, 'slow': 118}, {'kind': 'wma', 'length': 15}, {'kind': 'trima', 'length': 1808}, {'kind': 'bbands', 'length': 137}, {'kind': 'aobv', 'fast': 102, 'slow': 207}, {'kind': 'rsi', 'length': 3994}, {'kind': 'ema', 'length': 279}, {'kind': 't3', 'length': 3712}, {'kind': 'entropy', 'length': 10}, {'kind': 'inertia', 'length': 89}, {'kind': 'dema', 'length': 2553}, {'kind': 'tsi', 'fast': 11, 'slow': 385}, {'kind': 'trix', 'length': 6323}, {'kind': 'natr', 'length': 3862}, {'kind': 'coppock', 'fast': 27, 'slow': 379}, {'kind': 'qstick', 'length': 6113}, {'kind': 'sinwma', 'length': 6837}, {'kind': 'mad', 'length': 3543}, {'kind': 'ao', 'fast': 58, 'slow': 289}, {'kind': 'rma', 'length': 3418}, {'kind': 'psl', 'length': 192}, {'kind': 'rsi', 'length': 6}, {'kind': 'donchian', 'upper_length': 20, 'lower_length': 100}, {'kind': 'apo', 'fast': 23, 'slow': 109}, {'kind': 'vwma', 'length': 13}, {'kind': 'rsi', 'length': 64}, {'kind': 'pvi', 'length': 4961}, {'kind': 'decreasing', 'length': 5}, {'kind': 'vortex', 'length': 437}, {'kind': 'apo', 'fast': 23, 'slow': 208}, {'kind': 'quantile', 'length': 91}, {'kind': 'pvo', 'fast': 58, 'slow': 405}, {'kind': 'efi', 'length': 82}, {'kind': 'qstick', 'length': 99}, {'kind': 'cmf', 'length': 4394}, {'kind': 'vwma', 'length': 24}, {'kind': 'stoch', 'slow_d': 20, 'slow_k': 60, 'fast_k': 100}, {'kind': 'ichimoku', 'tenkan': 200, 'kijun': 400, 'senkou': 500}, {'kind': 'trima', 'length': 9}, {'kind': 'zscore', 'length': 1619}, {'kind': 'ui', 'length': 3621}, {'kind': 'brar', 'length': 8}, {'kind': 'ppo', 'fast': 37, 'slow': 783}, {'kind': 'donchian', 'upper_length': 100, 'lower_length': 100}, {'kind': 'willr', 'length': 32}, {'kind': 'rma', 'length': 586}, {'kind': 'massi', 'fast': 9, 'slow': 92}, {'kind': 'amat', 'fast': 18, 'slow': 386}, {'kind': 'kdj', 'length': 9}, {'kind': 'pwma', 'length': 6579}, {'kind': 'rsi', 'length': 684}, {'kind': 'qstick', 'length': 54}, {'kind': 'nvi', 'length': 2017}, {'kind': 'psl', 'length': 106}, {'kind': 'macd', 'fast': 61, 'slow': 81}, {'kind': 'zscore', 'length': 154}, {'kind': 'stoch', 'slow_d': 50, 'slow_k': 750, 'fast_k': 1000}, {'kind': 'tsi', 'fast': 40, 'slow': 89}, {'kind': 'adosc', 'fast': 21, 'slow': 58}, {'kind': 'swma', 'length': 189}, {'kind': 'variance', 'length': 43}, {'kind': 'pgo', 'length': 97}, {'kind': 'quantile', 'length': 50}, {'kind': 'percent_return', 'length': 70}, {'kind': 'eri', 'length': 30}, {'kind': 'bias', 'length': 58}, {'kind': 'linreg', 'length': 2654}, {'kind': 'eom', 'length': 15}, {'kind': 'cmf', 'length': 753}, {'kind': 'ichimoku', 'tenkan': 9, 'kijun': 400, 'senkou': 1000}, {'kind': 'zlma', 'length': 518}, {'kind': 'mfi', 'length': 1615}, {'kind': 'zscore', 'length': 277}, {'kind': 'cg', 'length': 79}, {'kind': 'eri', 'length': 55}, {'kind': 'pvi', 'length': 44}, {'kind': 'ao', 'fast': 16, 'slow': 342}, {'kind': 'trix', 'length': 5}, {'kind': 'stoch', 'slow_d': 200, 'slow_k': 500, 'fast_k': 1000}, {'kind': 'kurtosis', 'length': 200}, {'kind': 'zscore', 'length': 900}, {'kind': 'cg', 'length': 1511}, {'kind': 'massi', 'fast': 18, 'slow': 212}, {'kind': 'apo', 'fast': 43, 'slow': 217}, {'kind': 'dema', 'length': 12}, {'kind': 'trima', 'length': 5}, {'kind': 'sinwma', 'length': 3798}, {'kind': 'dpo', 'length': 6}, {'kind': 'median', 'length': 2200}, {'kind': 'ichimoku', 'tenkan': 50, 'kijun': 60, 'senkou': 100}, {'kind': 'cmf', 'length': 39}, {'kind': 'bbands', 'length': 7}, {'kind': 'mfi', 'length': 14}, {'kind': 'eri', 'length': 583}, {'kind': 'kc', 'length': 2431}, {'kind': 'decreasing', 'length': 2049}, {'kind': 'supertrend', 'length': 5}, {'kind': 'rvgi', 'length': 366, 'swma_length': 500}, {'kind': 'cci', 'length': 145}, {'kind': 'massi', 'fast': 35, 'slow': 264}, {'kind': 'trima', 'length': 558}, {'kind': 'trima', 'length': 52}, {'kind': 'apo', 'fast': 158, 'slow': 576}, {'kind': 'pvo', 'fast': 16, 'slow': 377}, {'kind': 'qstick', 'length': 16}, {'kind': 'kama', 'fast': 157, 'slow': 164}, {'kind': 'bbands', 'length': 42}, {'kind': 'fwma', 'length': 31}, {'kind': 'sma', 'length': 371}, {'kind': 'cmf', 'length': 129}, {'kind': 'log_return', 'length': 147}, {'kind': 'tsi', 'fast': 76, 'slow': 725}, {'kind': 'ao', 'fast': 30, 'slow': 62}, {'kind': 'apo', 'fast': 83, 'slow': 678}, {'kind': 'aobv', 'fast': 28, 'slow': 102}, {'kind': 'tsi', 'fast': 11, 'slow': 202}, {'kind': 'adosc', 'fast': 11, 'slow': 403}, {'kind': 'ppo', 'fast': 37, 'slow': 114}, {'kind': 'supertrend', 'length': 1168}, {'kind': 'cmf', 'length': 232}, {'kind': 'rvgi', 'length': 10, 'swma_length': 50}, {'kind': 'trix', 'length': 102}, {'kind': 'coppock', 'fast': 27, 'slow': 55}, {'kind': 'tsi', 'fast': 145, 'slow': 157}, {'kind': 'aberration', 'length': 270}, {'kind': 'pdist', 'length': 104}, {'kind': 'willr', 'length': 348}, {'kind': 'zlma', 'length': 160}, {'kind': 'entropy', 'length': 203}, {'kind': 'massi', 'fast': 67, 'slow': 72}, {'kind': 'zlma', 'length': 48}, {'kind': 'pwma', 'length': 348}, {'kind': 'ao', 'fast': 58, 'slow': 151}, {'kind': 'coppock', 'fast': 14, 'slow': 183}, {'kind': 'kurtosis', 'length': 1170}, {'kind': 'ppo', 'fast': 72, 'slow': 209}, {'kind': 'ema', 'length': 14}, {'kind': 'linreg', 'length': 4}, {'kind': 'accbands', 'length': 2709}, {'kind': 'massi', 'fast': 18, 'slow': 771}, {'kind': 'median', 'length': 679}, {'kind': 'amat', 'fast': 67, 'slow': 409}, {'kind': 'adosc', 'fast': 11, 'slow': 58}, {'kind': 'natr', 'length': 6}, {'kind': 'pvo', 'fast': 8, 'slow': 181}, {'kind': 'ema', 'length': 26}, {'kind': 'quantile', 'length': 1743}, {'kind': 't3', 'length': 33}, {'kind': 'aobv', 'fast': 15, 'slow': 117}, {'kind': 'ema', 'length': 1630}, {'kind': 'willr', 'length': 6578}, {'kind': 'qstick', 'length': 1886}, {'kind': 'midprice', 'length': 10}, {'kind': 'chop', 'length': 2871}, {'kind': 'rvi', 'length': 800}, {'kind': 'pvo', 'fast': 30, 'slow': 173}, {'kind': 'macd', 'fast': 32, 'slow': 365}, {'kind': 'brar', 'length': 5556}, {'kind': 'willr', 'length': 1128}, {'kind': 'decreasing', 'length': 107}, {'kind': 'cmf', 'length': 12}, {'kind': 'aberration', 'length': 1579}, {'kind': 'eom', 'length': 4}, {'kind': 'aberration', 'length': 45}, {'kind': 'midpoint', 'length': 592}, {'kind': 'log_return', 'length': 13}, {'kind': 'zlma', 'length': 288}, {'kind': 'efi', 'length': 25}, {'kind': 'swma', 'length': 104}, {'kind': 'slope', 'length': 662}, {'kind': 'pvo', 'fast': 16, 'slow': 197}, {'kind': 'mad', 'length': 5}, {'kind': 'tsi', 'fast': 21, 'slow': 263}, {'kind': 'bias', 'length': 2007}, {'kind': 'ppo', 'fast': 10, 'slow': 616}, {'kind': 'eri', 'length': 1891}, {'kind': 'quantile', 'length': 538}, {'kind': 'rvgi', 'length': 19, 'swma_length': 500}, {'kind': 'natr', 'length': 6953}, {'kind': 'supertrend', 'length': 18}, {'kind': 'sinwma', 'length': 1172}, {'kind': 'skew', 'length': 314}, {'kind': 'atr', 'length': 565}, {'kind': 'vwma', 'length': 457}, {'kind': 'quantile', 'length': 3137}, {'kind': 't3', 'length': 60}, {'kind': 'massi', 'fast': 18, 'slow': 112}, {'kind': 't3', 'length': 353}]

I am not sure if I am enabling multiprocessing correctly because it is only using 1 core. Also, I got the following error:

Traceback (most recent call last):
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\IPython\core\interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-e945fb6c26c9>", line 8, in <module>
    X, y, state = pipe_dev.fit_transform(X_trn)
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\imblearn\pipeline.py", line 320, in fit_transform
    return last_step.fit(Xt, yt, **fit_params).transform(Xt)
  File "D:\Projects\tipjar\tipjar\prepare\technical_analysis.py", line 117, in transform
    X.ta.strategy(CustomStrategy, verbose=True, timed=True)
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\pandas_ta\core.py", line 520, in strategy
    [getattr(self, kwds["kind"])(**kwds) for kwds in ta]
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\pandas_ta\core.py", line 520, in <listcomp>
    [getattr(self, kwds["kind"])(**kwds) for kwds in ta]
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\pandas_ta\core.py", line 45, in _wrapper
    result = method(cm, **method_kwargs)
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\pandas_ta\core.py", line 824, in fwma
    result = fwma(close=close, length=length, offset=offset, **kwargs)
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\pandas_ta\overlap\fwma.py", line 14, in fwma
    fibs = fibonacci(n=length, weighted=True)
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\pandas_ta\utils.py", line 264, in fibonacci
    fib_sum = np.sum(result)
  File "<__array_function__ internals>", line 6, in sum
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\numpy\core\fromnumeric.py", line 2242, in sum
    initial=initial, where=where)
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\numpy\core\fromnumeric.py", line 87, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
OverflowError: int too large to convert to float
twopirllc commented 3 years ago

@jmrichardson,

Thanks for installing the development branch and whacking it with a big tree.

I am not sure if I am enabling multiprocessing correctly because it is only using 1 core.

How many total cores do you have? Multiprocessing is on by default in this branch and defaults to all the cores.

So an OverflowError: int too large to convert to float from fib_sum = np.sum(result) in fibonacci() when calling fwma(close=close, length=3451).

import pandas as pd
import pandas_ta as ta

# Bombs when length=3451 and weighted
df = pd.read_csv("data.csv") # Errr whatever it is
#df.ta.mp = True # Not needed anymore, it's on by default for df.ta.strategy(...)
fibo = df.ta.fibonacci(3451, weighted=True) 
print(fibo)

Unfortunately that is something I can not control. It bombs because of the sum is so large it overflows. It is adding all the Fibonacci numbers from 0 - 3451 so it can be used to find the divisor for the 3451 weights.

On another note, you have some indicator lengths ranging from low numbers to upwards of 6000. How were these numbers chosen for the various indicators?

KJ

jmrichardson commented 3 years ago

LOL, sorry about that :) I will choose a twig this time instead of the entire tree as I used to when my Mom made me go find a switch to beat me with :)

I have 8 cores on my dev box, but 56 on my prod server. I haven't tried on the prod server but just looking at the processor utilization in win10 it doesn't peak at all above 20% utilization.

That's interesting about the Overflow error. I haven't seen it before and haven't changed the indicator generation code. I actually randomly sampled 300 indicators instead of usually running much more. So, for each indicator, I generate a growing range by percent then add some noise by +/- 25%:

noise_int(range_grow(5, 12, .8))

0        4
1        7
2       12
3       23
4       41
5       75
6      136
7      246
8      443
9      797
10    1435
11    2583
12    4651

So for the above, an indicator will be generated 12 times with different lengths. Then the next indicator will be generated 12 times with slightly different lengths, and so on. So yes its a pretty big tree whack. The idea behind it is to give the ML model a different perspective for each indicator with different lengths (not having any domain experience here as to what indicators and lengths generate more alpha). To avoid overfitting, feature reduction is done to remove all irrelevant which is quite a lot. My dataset is by 1 minute frequency so I have a good amount of historical data to avoid overfitting anyways. I am actually getting better than expected results with accuracy not only from your TA features but also other time series extractions. I have just converted everything into scikit format so I am still working on each step of the pipeline...

twopirllc commented 3 years ago

@jmrichardson,

LOL, sorry about that :) I will choose a twig this time instead of the entire tree as I used to when my Mom made me go find a switch to beat me with :)

Doh! Yeah, I wasn't expecting a Redwood tree trunk. 🤣

I have 8 cores on my dev box, but 56 on my prod server. I haven't tried on the prod server but just looking at the processor utilization in win10 it doesn't peak at all above 20% utilization.

Wish I had a box with 8 - 56 cores available! 😮 Not peaking higher than 20% isn't bad either.

That's interesting about the Overflow error. I haven't seen it before and haven't changed the indicator generation code.

Interesting. The Fibonacci function is calculating using integers, I suppose it can be changed to floats and maybe there might not be an Overflow error. But I really doubt the veracity of needing a ta.fwma() of such lengths.

I actually randomly sampled 300 indicators instead of usually running much more. So, for each indicator, I generate a growing range by percent then add some noise by +/- 25%.

So for the above, an indicator will be generated 12 times with different lengths. Then the next indicator will be generated 12 times with slightly different lengths, and so on. So yes its a pretty big tree whack.

Makes sense now how your are generating the features. Yeah, that's a chunky tree.

The idea behind it is to give the ML model a different perspective for each indicator with different lengths (not having any domain experience here as to what indicators and lengths generate more alpha).

I can see where you are coming from and what you are trying to accomplish. Like trying to give the ML a meta or holistic view of the data and let it figure it out.

To avoid overfitting, feature reduction is done to remove all irrelevant which is quite a lot.

Unsurprising. I can foresee that with the feature generation alone.

My dataset is by 1 minute frequency so I have a good amount of historical data to avoid overfitting anyways.

Wow. Nice. In your experience, how much data is needed to train a simple model on a daily time frame? Also, do you generate higher time frame datasets from the 1 minute time frame for multi-time frame analysis? If so, how has that worked out?

I am actually getting better than expected results with accuracy not only from your TA features but also other time series extractions.

There is definitely value in there. I am interested in learning from your experience of what you have been able to accomplish. Is the goal to eventually, use a reduced Strategy set akin to a dagger instead of a tree trunk? Have you done any forward testing with it? Are there any promising TA features? I have my thoughts, but no results to back it up yet.

I have just converted everything into scikit format so I am still working on each step of the pipeline...

Yeah. I have been busy between life and Pandas TA to be anywhere near ML/AI components. Plus there are other features I want to add to Pandas TA, but little time with the increase of issues this year. 🤷‍♂️

jmrichardson commented 3 years ago

Wow. Nice. In your experience, how much data is needed to train a simple model on a daily time frame? Also, do you generate higher time frame datasets from the 1 minute time frame for multi-time frame analysis? If so, how has that worked out?

IMHO, it's about the ratio of samples to features. A daily model will have ~250 observations per year. If you did a ten year lookback, that would be ~2500 observations. For correlated features, I've read that the square of n_observations is a rule of thumb (so about 50 features).

Yes, the goal is to include in an ensemble (voting) model a daily "headwind" classification to see if that helps with accuracy. Right now I am just working on 1 min price data.

There is definitely value in there. I am interested in learning from your experience of what you have been able to accomplish. Is the goal to eventually, use a reduced Strategy set akin to a dagger instead of a tree trunk? Have you done any forward testing with it? Are there any promising TA features? I have my thoughts, but no results to back it up yet.

Yes, the goal is to reduce the features to just the important ones. My testing so far has been a reduction of around 1500 features to ~300 using Boruta dimension reduction. My initial test dataset is about 1M, and about 26K after labeling for the events I want to model. So 300 features with a training set of about 20K is pretty good start to avoid overfitting. Of course, tuning the hyperparameters is important as well with cross validation. I haven't really payed attention to the features that are important but I will run another test in a few days and share with you. However, it's likely going to be a mutation of features. I use genetic feature engineering to do polynomial/aritmetic operations across all of my features. These are likely to hold the most value and I don't know what features were used. I also use autoencoders as well....

Once you have some free time, let me know and will be glad to invite you to my github repo. I am hoping to have the basic process flow done in the next couple of weeks. Then will be adding exogenous data and also other frequency (daily) models into the ensemble.

twopirllc commented 3 years ago

Yes, the goal is to reduce the features to just the important ones. My testing so far has been a reduction of around 1500 features to ~300 using Boruta dimension reduction.

Ok good! I'm taking sort the minimalist approach, keeping the Strategy lean, fast, and long only (for the time being).

Of course, tuning the hyperparameters is important as well with cross validation.

Of course.

I use genetic feature engineering to do polynomial/aritmetic operations across all of my features. These are likely to hold the most value and I don't know what features were used. I also use autoencoders as well....

Yeah, it is quite the ensemble. 👍

Once you have some free time, let me know and will be glad to invite you to my github repo. I am hoping to have the basic process flow done in the next couple of weeks. Then will be adding exogenous data and also other frequency (daily) models into the ensemble.

Will do. You can also shoot me an email for offline discussions anytime.

I will be likely pushing this branch to master by the end of the week if nothing serious pops during your tests. It would awesome if you could test the last example (and some of your own variants) in Strategy Example Notebook that another user included to simplify parameter input using a tuple instead of keyword arguments. Glad you are able to bulk test with the library 🙏 and it holds up.

KJ