Open xucian opened 2 years ago
Hello @tfgstudios,
Pandas TA is largely a Python implementation of TA Lib (and some few TradingView indicators) and thus the default mode for this Open Source implementation.
For the sake of brevity, I am only addressing ema.
This bug/feature sounds remarkably similar to Issue #420, TA Lib and it's Unstable Period as well as code and documentation of TA Lib's EMA for more details. I chose to implement TA_MA_CLASSIC computation.
Pandas TA currently has three options for an ema, help(ta.ema)
. You've have tried two of them (1 & 2).
talib=False
or TA Lib is not installed, it yields the same result as 1., as intended and you noted.talib=False
and presma=False
are arguments. (All three in the code and charts below).
Expected behavior EMAs values should not be affected by input values other than those mathematically required to calculate it
_df = ta.df.ta.ticker("AA", timed=True)["Close"]
def bad_ema(src=None, length=None, n_samples=601):
if src is None:
src = pd.Series(np.linspace(40_000, 50_000, n_samples), dtype=float)
tal_ema = ta.ema(src, length=length, talib=True)
pta_presma_ema = ta.ema(src, length=length, talib=False)
pta_ema = ta.ema(src, length=length, talib=False, presma=False)
return pd.DataFrame({
"close": src,
f"tal_ema{length}": tal_ema,
f"pta_presma_ema{length}": pta_presma_ema,
f"pta_ema{length}": pta_ema
})
def cplot(df, last=None):
if isinstance(last, int):
df = df.iloc[-last:,:]
print(df.shape)
df.plot(figsize=(16,6), color=["black", "red", "orange", "green"] , grid=True)
ma_length = 10
n_samples = 50
closedf = _df.iloc[:n_samples].copy()
df = bad_ema(closedf, length=ma_length, n_samples=n_samples)
cplot(df, last=None)
df.tail()
It is clearly evident that _ptaema10 (green line), in this case, adheres to "EMAs values should not be affected by input values other than those mathematically required to calculate it" expectation you desire. Which by one definition of ema relies on a minimum of two values.
Kind Regards, KJ
Hi @twopirllc Thanks for the detailed answer!
I'm not sure I understand this part: '_It is clearly evident that ptaema10 (green line), in this case, adheres to'
unstable_period None, talib True, presma True: 0 out of 1 are equal
unstable_period 0, talib True, presma True: 0 out of 1 are equal
unstable_period 5, talib True, presma True: 0 out of 1 are equal
unstable_period 34, talib True, presma True: 0 out of 1 are equal
unstable_period 64, talib True, presma True: 0 out of 1 are equal
unstable_period 65, talib True, presma True: 0 out of 1 are equal
unstable_period 66, talib True, presma True: 0 out of 1 are equal
unstable_period 100, talib True, presma True: 0 out of 1 are equal
unstable_period 149, talib True, presma True: 0 out of 1 are equal
unstable_period 150, talib True, presma True: 0 out of 1 are equal
unstable_period 151, talib True, presma True: 0 out of 1 are equal
unstable_period 200, talib True, presma True: 0 out of 1 are equal
unstable_period 300, talib True, presma True: 0 out of 1 are equal
unstable_period 600, talib True, presma True: 0 out of 1 are equal
unstable_period 700, talib True, presma True: 0 out of 1 are equal
unstable_period 900, talib True, presma True: 0 out of 1 are equal
unstable_period 1500, talib True, presma True: 0 out of 1 are equal
unstable_period 2000, talib True, presma True: 0 out of 1 are equal
unstable_period None, talib True, presma False: 0 out of 1 are equal
unstable_period 0, talib True, presma False: 0 out of 1 are equal
unstable_period 5, talib True, presma False: 0 out of 1 are equal
unstable_period 34, talib True, presma False: 0 out of 1 are equal
unstable_period 64, talib True, presma False: 0 out of 1 are equal
unstable_period 65, talib True, presma False: 0 out of 1 are equal
unstable_period 66, talib True, presma False: 0 out of 1 are equal
unstable_period 100, talib True, presma False: 0 out of 1 are equal
unstable_period 149, talib True, presma False: 0 out of 1 are equal
unstable_period 150, talib True, presma False: 0 out of 1 are equal
unstable_period 151, talib True, presma False: 0 out of 1 are equal
unstable_period 200, talib True, presma False: 0 out of 1 are equal
unstable_period 300, talib True, presma False: 0 out of 1 are equal
unstable_period 600, talib True, presma False: 0 out of 1 are equal
unstable_period 700, talib True, presma False: 0 out of 1 are equal
unstable_period 900, talib True, presma False: 0 out of 1 are equal
unstable_period 1500, talib True, presma False: 0 out of 1 are equal
unstable_period 2000, talib True, presma False: 0 out of 1 are equal
unstable_period None, talib False, presma True: 0 out of 1 are equal
unstable_period 0, talib False, presma True: 0 out of 1 are equal
unstable_period 5, talib False, presma True: 0 out of 1 are equal
unstable_period 34, talib False, presma True: 0 out of 1 are equal
unstable_period 64, talib False, presma True: 0 out of 1 are equal
unstable_period 65, talib False, presma True: 0 out of 1 are equal
unstable_period 66, talib False, presma True: 0 out of 1 are equal
unstable_period 100, talib False, presma True: 0 out of 1 are equal
unstable_period 149, talib False, presma True: 0 out of 1 are equal
unstable_period 150, talib False, presma True: 0 out of 1 are equal
unstable_period 151, talib False, presma True: 0 out of 1 are equal
unstable_period 200, talib False, presma True: 0 out of 1 are equal
unstable_period 300, talib False, presma True: 0 out of 1 are equal
unstable_period 600, talib False, presma True: 0 out of 1 are equal
unstable_period 700, talib False, presma True: 0 out of 1 are equal
unstable_period 900, talib False, presma True: 0 out of 1 are equal
unstable_period 1500, talib False, presma True: 0 out of 1 are equal
unstable_period 2000, talib False, presma True: 0 out of 1 are equal
unstable_period None, talib False, presma False: 0 out of 1 are equal
unstable_period 0, talib False, presma False: 0 out of 1 are equal
unstable_period 5, talib False, presma False: 0 out of 1 are equal
unstable_period 34, talib False, presma False: 0 out of 1 are equal
unstable_period 64, talib False, presma False: 0 out of 1 are equal
unstable_period 65, talib False, presma False: 0 out of 1 are equal
unstable_period 66, talib False, presma False: 0 out of 1 are equal
unstable_period 100, talib False, presma False: 0 out of 1 are equal
unstable_period 149, talib False, presma False: 0 out of 1 are equal
unstable_period 150, talib False, presma False: 0 out of 1 are equal
unstable_period 151, talib False, presma False: 0 out of 1 are equal
unstable_period 200, talib False, presma False: 0 out of 1 are equal
unstable_period 300, talib False, presma False: 0 out of 1 are equal
unstable_period 600, talib False, presma False: 0 out of 1 are equal
Update: I now see that presma is present in the development branch. I'm quite reticent to using that branch. It can have more stability issues, right? I also remember trying to switch to it in the past, but it returned different dimensioned arrays for some indicators (IIRC, the stoch returned 3 arrays instead of 2, and my code logic relies heavily on it returning 2. I can just ignore the additional array, but makes me wonder if there won't be other subtle but important breaking changes)
And lastly, about presma: if this would fix the EMA, are there options to fix NATR, STOCH and VWAP as well? Or, at least for STOCH? (I can't reproduce the difference for NATR and VWAP anymore -- maybe it was an error on my part) If there's a solution for stoch, I'd prefer it to not alter the current values the stoch produces in a significant way -- I've already trained few hundreds parameters based on it and can't afford retraining the models.
Thanks again!
Update2: I tried the dev branch and this test doesn't fail anymore (but notice presma=True, talib=False, otherwise it fails with 45008.333333333336 != 45008.33333333333):
def test_indicators_are_not_affected_by_values_outside_their_area_of_interest():
ema_len = 600
n_samples = 601
samples = np.linspace(40_000, 50_000, n_samples)
def create_ema(_samples):
from pandas import Series
import pandas as pd
import pandas_ta as ta
ser: Series = ta.ema(pd.Series(_samples, dtype=float), length=ema_len, presma=True, talib=False)
return ser.to_numpy()
def _test__pands_ta__values_outside_ema_window__does_not_influence__ema_inside_window():
ema_full = create_ema(samples)
ema_without_first = create_ema(samples[1:])
assert ema_full[-1] == ema_without_first[-1]
_test__pands_ta__values_outside_ema_window__does_not_influence__ema_inside_window()
But my other tests still fail in some cases (described by *()** above).
@tfgstudios,
I now see that presma is present in the development branch. I'm quite reticent to using that branch. It can have more stability issues, right?
Regarding ema, it made more sense to rename the argument sma
to presma
from v0.3.14 to development.
The development branch is equally stable as it's former self, v0.3.14 and better. Whether you decide to use the development branch or not, I will not be supporting v0.3.14 as it will get replaced by a future version of the development branch after completing TODO's Hilbert Transform Indicators, ht_*
, under remaining Indicators.
I also remember trying to switch to it in the past, but it returned different dimensioned arrays for some indicators (IIRC, the stoch returned 3 arrays instead of 2, and my code logic relies heavily on it returning 2. I can just ignore the additional array, but makes me wonder if there won't be other subtle but important breaking changes)
This library is more feature rich in comparison to some other TA libraries out there and thus some indicators will have more details/columns included with the result, like stoch. It is up to the user to drop or exclude extra columns that has no value to them. Others fork the repo and make adjustments.
If there's a solution for stoch, I'd prefer it to not alter the current values the stoch produces in a significant way -- I've already trained few hundreds parameters based on it and can't afford retraining the models.
The next time I will be touching it is when I convert it to numpy/numba. At the current rate, it won't be anytime soon.
But my other tests still fail in some cases (described by (*) above).
There are several other TA libraries out there. Have you tried them? I am curious if they have solved "indicator(s) are affected by values outside their area of interest"? 🤔
KJ
Expected behavior EMAs values should not be affected by input values other than those mathematically required to calculate it
This is what I hear:
As shown above, set talib=False, presma=False
only uses two consecutive values as detailed here.
This calculation is done by Panda's ewm.
def ema(*args, **kwargs):
# ...
close.ewm(span=length, adjust=adjust).mean() # where adjust=False
# ...
Which version are you running? The lastest version is on Github. Pip is for major releases. pandas-ta-0.3.14b0 (main) pandas-ta-0.3.65b0 (development)
Do you have TA Lib also installed in your environment? TA_Lib-0.4.24-cp38-cp38-win_amd64.whl
Did you upgrade? Did the upgrade resolve the issue? already at the latest version
Describe the bug EMA600's values are influenced by values that are even outside the last 600 observations. Shouldn't each value in an EMAX be obtained by the 'last X observations'? Why is it that even the ones before the last X observations (i.e. that aren't needed for EMA's calculation) affect its values? The difference isn't much, but if you're working on a 1m timeframe with an ema that has 256k data points (~6 months of data), the last value in that EMA would probably be very different from the last value of an EMA created from just the most recent 600 points (i.e. for simplicity, one with only 1 valid value -- the last one). Even if the difference wasn't much, my initial question holds. I'd like to disable any internal optimizations so that if EMA600 only needs 600 data points, it shouldn't care about- and be affected by what's before those 600 data points. How should I go about this without reinventing the wheel, i.e. creating a 600-len array for each individual step and create call ta.ema() multiple times?
To Reproduce
Gives:
Expected behavior EMAs values should not be affected by input values other than those mathematically required to calculate it
Additional context If we think about any indicator that requires 'the last x observations', is should be implemented this way: [Python-like pseudocode]
In a nutshell, we move from end to start, and get a mathematically accurate indicator value at each data point. I assume pandas goes from start to end or uses some arithmetic approximations, and this is happening regardless of whether I pass talib=False or True