The correlation criteria for test pass/fail is inconsistent

Which version are you running? The lastest version is on Github. Pip is for major releases.

import pandas_ta as ta
print(ta.version)
0.3.45b0

Do you have TA Lib also installed in your environment? No

Did you upgrade? Did the upgrade resolve the issue? Yes, force an re-install. Issue remains.

Describe the bug During my adhoc tests, I realized that test compares the result with other lib (ta-lib), which is fine. It also compares value by value through

pdt.assert_series_equal(result, expected, check_names=False)

But, the code makes a correlation with the 2 results which is not correct.

corr = pandas_ta.utils.df_error_analysis(result, expected, col=CORRELATION)
self.assertGreater(corr, CORRELATION_THRESHOLD)

Correlation is not a good critera because results may have correlation somehow. The test that I did was to change the results of ta-lib to the candle high values for sma function. The test result is PASS, why? Because the result has correlation with high values.

Example: For a correct PASS: test_sma(), correlation = 1.0 AssertionError: Series are different Series values are different (0.01908 %) :. series are a bit different, and correlation is 1.

For a incorrect PASS (positive-false): test_sma(), correlation = 0.9991919523534346 AssertionError: Series are different Series values are different (99.96184 %) :. series are completely different, but correlation is almost 1 ! The test result is a Positive-False, because correlation criteria is not wise.

To Reproduce get the demo code at: https://github.com/ffhirata/pandas-ta/tree/testCorrelationFailure Run:

python -m unittest -v tests.test_indicator_overlap.main or Check the report output in attachment, issue_report.pdf. issue_report.pdf

Expected behavior The criteria should be in comparison of value by value, if a row is not equal than a justification should be provide, ie, line by line. The test results could be organized in campaigns with static reports, because of manual justification. or The criteria should be based on assertion output of Series values are different (xxx %).

Screenshots none

Additional context The confidence of lib has to be reviewed.

Thanks for using Pandas TA!

Hello @ffhirata,

Thanks for the deep dive and analysis of the correlation testing. I have been aware that correlation testing is not the best choice when comparing results between Pandas TA and TA Lib. It was something I quickly setup to give me a rough idea of how far indicators were off with TA Lib if they were not equal when using default arguments.

The criteria should be in comparison of value by value, if a row is not equal than a justification should be provide, ie, line by line. The test results could be organized in campaigns with static reports, because of manual justification. or The criteria should be based on assertion output of Series values are different (xxx %).

The confidence of lib has to be reviewed.

What would it take to improve the confidence of this library for you? Are you willing to take on this task? If so, that would be greatly appreciated.

Kind Regards, KJ

Hi @twopirllc,

This task is very challenging and time consuming. Currently, I am busy with study and work, but I though that it is an unique opportunity for growth knowledge in many areas (python, coding, testing methodology, finance indicators, interactions with developers, github). So, I accept the task.

I have software testing background, not python application, but it can be useful. I also will need time and help to familiarize with test approaches that you have in mind. If you can send me projects benchmarks to study will be a good start.

My idea is standardize a test vector with inputs (date, open, high, low, close, volume) and expected result. Each financial indicator will have one or more test vector to test the whole functionality. The expected results can be calculated by ta-lib or other platforms. So, expected result is static, not calculated in run time. The output of test execution is an automatic report with pass/fail for each row. If fail exist and is correct, then it should be provided a manual justification after analysis of test execution. The reports are stored as test results for final acceptance and evidence of correctness. The test of financial indicator will be executed only when the code is changed during software release, otherwise the last test result is still valid.

"What would it take to improve the confidence of this library for you?" Test reports to check that it was verified against most used platforms for trading, or where I can use my own test vector to compare with my expected results.

Kind Regards, ffhirata

@ffhirata,

This task is very challenging and time consuming. Currently, I am busy with study and work, but I though that it is an unique opportunity for growth knowledge in many areas (python, coding, testing methodology, finance indicators, interactions with developers, github). So, I accept the task.

I understand. As you can tell, I am also time challenged. I continually learn new things as well while developing this library and application. Despite your other time commitments, I appreciate you willing to take on this task. I hope to learn something new from it as well. 😎

The expected results can be calculated by ta-lib or other platforms.

Unfortunately, TA Lib is the only library that can be tested against without having to make remote connections. Now TradingView (TV) can be tested against, however it requires one with a paid subscription to get the benefit to manually downloading ohlcv data + indicator results, see https://github.com/twopirllc/pandas-ta/issues/107#issuecomment-685834922. Do you know of other trading platforms where one can cheaply and easily obtain ohlcv data + indicator results for comparison?

So, expected result is static, not calculated in run time. The output of test execution is an automatic report with pass/fail for each row. If fail exist and is correct, then it should be provided a manual justification after analysis of test execution. The reports are stored as test results for final acceptance and evidence of correctness. The test of financial indicator will be executed only when the code is changed during software release, otherwise the last test result is still valid.

Sounds cool and interesting! 😎 I understand to some degree what you want to achieve, but it will be clearer when you have made some headway with some code. Since we are a point of improving the testing process, perhaps we should also consider using a different Testing Framework?

On another note, in terms of indicator performance (with and without TA Lib), this is my performance method that I will include with Pandas TA. Sorry there is no docstring yet, but the arguments should help. There is an example below.

def performance(df: DataFrame,
        excluded: list = None, top: int = None, talib: bool = False,
        ascending: bool = False, sortby: str = "secs",
        gradient: int = False, places: int = 5, stats: bool = False,
        verbose: bool = False
    ) -> DataFrame:
    if df.empty: return
    talib = bool(talib) if isinstance(talib, bool) and talib else False
    top = int(top) if isinstance(top, int) and top > 0 else None
    stats = bool(stats) if isinstance(stats, bool) and stats else False
    verbose = bool(verbose) if isinstance(verbose, bool) and verbose else False

    _ex = ["above", "above_value", "below", "below_value", "cross", "cross_value", "ichimoku"]
    if isinstance(excluded, list) and len(excluded) > 0:
        _ex += excluded
    indicators = df.ta.indicators(as_list=True, exclude=_ex)
    if len(indicators) == 0: return None

    def ms2secs(ms, p: int):
        return round(0.001 * ms, p)

    def indicator_time(df: DataFrame, group: list = [], index_name: str = "Indicator", p: int = 4):
        times = []
        for i in group:
            r = df.ta(i, talib=talib, timed=True)
            ms = float(r.timed.split(" ")[0].split(" ")[0])
            times.append({index_name: i, "secs": ms2secs(ms, p), "ms": ms})
        return times

    _iname = "Indicator"
    if verbose:
        print()
        data = indicator_time(df.copy(), indicators, _iname, places)
    else:
        _this = StringIO()
        with redirect_stdout(_this):
            data = indicator_time(df.copy(), indicators, _iname, places)
        _this.close()

    tdf = DataFrame.from_dict(data)
    tdf.set_index(_iname, inplace=True)
    tdf.sort_values(by=sortby, ascending=ascending, inplace=True)

    total_timedf = DataFrame(tdf.describe().loc[['min', '50%', 'mean', 'max']]).T
    total_timedf["total"] = tdf.sum(axis=0).T
    total_timedf = total_timedf.T

    _div = "=" * 60
    _observations = f"  Observations: {df.shape[0]} {'[talib]' if talib else ''}"
    _quick_slow = "Quickest" if ascending else "Slowest"
    _title = f"  {_quick_slow} Indicators"
    _perfstats = f"Time Stats:\n{total_timedf}"
    if top:
        _title = f"  {_quick_slow} {top} Indicators [{tdf.shape[0]}]"
        tdf = tdf.head(top)
    print(f"\n{_div}\n{_title}\n{_observations}\n{_div}\n{tdf}\n\n{_div}\n{_perfstats}\n\n{_div}\n")

    if isinstance(gradient, bool) and gradient:
        return tdf.style.background_gradient("autumn_r"), total_timedf

    if stats:
        return tdf, total_timedf
    else:
        return tdf

Should be easy to run. Would be interested in your results also. Of course, it will depend on the number of rows/bars/... of the DataFrame. Which allows us to gauge performance metrics of varying size. 😎

import pandas_ta as ta

df = # ohlcv
print(df.shape)

performance(df, top=5, talib=True, ascending=False, places=4, stats=False, verbose=True)
performance(df, top=5, ascending=False, places=4, stats=False, verbose=True)

# Ideally, you want to do 3-4 runs for performance to really improve. Not sure why.
performance(df, top=5, talib=True, ascending=False, places=4, stats=False, verbose=True)
performance(df, top=5, ascending=False, places=4, stats=False, verbose=True)

I look forward to seeing what you can come up with and I appreciate the contributions. 😎 Feel free to email me if needed.

Kind Regards, KJ

Hey @ffhirata,

You should check out the latest development branch. New README, better documentation and typing, better performance et al.

$ pip install -U git+https://github.com/twopirllc/pandas-ta.git@development

I am interested in your indicator performance results also. There is a Performance Check Notebook as well. Performance Check

Hope all is well!

Talk to you later, KJ

twopirllc / pandas-ta

The correlation criteria for test pass/fail is inconsistent #478