ZLMA bug - Githubissues

twopirllc / pandas-ta

Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 150+ Indicators

https://twopirllc.github.io/pandas-ta/

MIT License

5.42k stars 1.06k forks source link

ZLMA bug #516

Open mihakralj opened 2 years ago

mihakralj commented 2 years ago

d = pd.DataFrame(np.array([1,2,3,4,5,6,7,8,9,10]))
d.ta.zlma(close=d[0], length= 5, append=True)
print(d)

output:

    0  ZL_EMA_5
0   1       NaN
1   2       NaN
2   3       NaN
3   4       NaN
4   5      **_6.00_**
5   6      6.67
6   7      7.44
7   8      8.30
8   9      9.20
9  10     10.13

the input series has the first 5 values clearly below 6.0
ZLMA somehow calculates that the first visible value (for the input of 5.0) is 6.0
the calculation of the corrected input (using an offset) is not returning correct values for early data.
ZLMA uses EMA by default, making this bug (nearly) invisible after 20+ bars are calculated.

mihakralj commented 2 years ago

doing manual calculation on paper. this is actually correct calculation - ZLEMA overshoots on purpose in the early part of the calc..

mihakralj commented 2 years ago

I researched this further. I do believe we have a bug in early calculations (when n<lag). Here is the formula from zlma.py:

 lag = int(0.5 * (length - 1))
 close_ = 2 * close - close.shift(lag)

the close.shift(lag) will generate NaN for all values where shift points outside the dataframe, creating the early value of 2*close.

Most of my reserach and calculations show me that we should use a simple close (instead of 2*close) for values when n<lag; early results from the zlma would be more consistent.

as this is asymptotical indicator, the bug irons out in 20+ bars or so - but I'd still suggest we adapt the close.shift(lag) with a ternary operator that is checking if shift(lag) is possible and - if not - use a simple close instead.

Thoughts?

(btw, I am obsessed with valid early values of indicators... My C# library doesn't hide anything with NaN and generates values from the first bar on...)

twopirllc commented 2 years ago

@mihakralj,

Most of my reserach and calculations show me that we should use a simple close (instead of 2*close) for values when n<lag; early results from the zlma would be more consistent.

Sounds good. As you know, if you want to make PR so I can try it out and see how to best incorporate it, that would be great. Sometimes I am not gifted with abstraction and something concrete can help me visualize and understand better. So I appreciate the spare time, consideration and attention to detail you have contributed to help make this library while also building on your own C# version. 😎

(btw, I am obsessed with valid early values of indicators... My C# library doesn't hide anything with NaN and generates values from the first bar on...)

No worries. That would be my preferred (and potentially next) approach as well if I was not providing a Python fall back of TA Lib.

Thanks, KJ

mihakralj commented 2 years ago

OK, PR on its way this weekend.

(BTW - does kama indicator work at all for you? I get returned empty series when calling kama from Pandas-Ta)

twopirllc commented 2 years ago

@mihakralj,

Awesome. Yes, the default for sure. I usually test with 1 year bars with of data. Are there certain parameters or minimum bar requirements failing?

mihakralj commented 2 years ago

Must be my concoction of your code and my dissections and mutilations... Let me go back to basics.

twopirllc commented 2 years ago

No worries.