twopirllc / pandas-ta

Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 150+ Indicators
https://twopirllc.github.io/pandas-ta/
MIT License
5.21k stars 1.01k forks source link

`df.ta.cdl_z()` gives impossible high and low values #703

Open JanHomann opened 1 year ago

JanHomann commented 1 year ago
pandas-ta       0.3.14b0
TA-Lib          0.4.27
yfinance        0.2.26

Problem Description

The current implementation of cdl_z() often creates low values that are higher than the open, close or high and high values that are lower than the open, close or low.

Example

df = pd.DataFrame()
df.ta.ticker("spy", interval='1h', start="2023-01-01",  end="2023-07-27")
df[:"2023-07-27 15:30"].ta.cdl_z()
                           open_Z_30_1    high_Z_30_1     low_Z_30_1     close_Z_30_1
Datetime                
2022-07-27 09:30:00-04:00          NaN        NaN        NaN          NaN
2022-07-27 10:30:00-04:00          NaN        NaN        NaN          NaN
2022-07-27 11:30:00-04:00          NaN        NaN        NaN          NaN
2022-07-27 12:30:00-04:00          NaN        NaN        NaN          NaN
2022-07-27 13:30:00-04:00          NaN        NaN        NaN          NaN
... ... ... ... ...
2023-07-26 11:30:00-04:00     0.293027   0.557565   0.683786      0.51043
2023-07-26 12:30:00-04:00      0.51497       0.114745       0.389549         0.209932
2023-07-26 13:30:00-04:00     0.176863       0.404245   0.201786      0.23817
2023-07-26 14:30:00-04:00     0.217598    2.10736      -0.173332         0.351198
2023-07-26 15:30:00-04:00     0.334565   0.725941   0.360631       1.1173

Expected Behavior

As you can see in the example above, the last row has an open that is lower than the low (0.334565 < 0.360631) and a close that is higher than the high (1.1173 > 0.725941). This happens a lot, see two rows higher. Three rows higher the low is even higher than the high. This can be easily seen in any dataset, no need to exactly use the data in this example.

Additional Context

This seems to be related to the way the computation is performed. In the computation, the time series for open, high, low and close are independently z-scored, but they need to be z-scored together OR the open high low close need to be reassigned.

One way to achieve this would be to first compute the rolling z-score of the closing prices and then construct rescaled candles that have at least the property that the relative position of the open stays like it was before the z-scoring. (meaning that if we gapped up, we should still gap up in the z-scored version and not gap down suddenly just because of the z-scoring).

twopirllc commented 1 year ago

@JanHomann

I see. 🤔

One way to achieve this would be to first compute the rolling z-score of the closing prices ...

Since we have the whole candle (ohlc), what about using hl2 or oc2 or ohlc4 instead of close? Wouldn't one of those mean values be better? Thoughts?

Kind Regards, KJ