skyzeio / Skyze

Skyze Trading Engine Early Prototyping .... done as a single app with messaging between components ... will convert to the services architecture where components = Services and messaging = Asynch Messaging with Pub/Sub Pattern
GNU General Public License v3.0
10 stars 9 forks source link

Pandas and Numpy floating point comparisons #17

Open mikenew opened 7 years ago

mikenew commented 7 years ago

How to compare (assert_equal) two data frames of floats?

Seems numpy.isclose(a,b) doesn't work ..

__In UnitTestSkyzeAbstract::series_assert line 103 ish__

`

diffs["Different"] = diffs["test"] != diffs["target"]

        # see PEP485 for use of isclose
        # which rows are different?
        diffs["Different"] = np.isclose(diffs["test"], diffs["target"], rtol=1e-05, atol=1e-08, equal_nan=False)`

Returns the below ... different = true even though the amount i <1e010

` --- FAIL: H-PC errors: 1585 of 1586 ... 99.94 <class 'pandas.core.series.Series'> <class 'numpy.float64'> <class 'pandas.core.series.Series'> <class 'numpy.float64'> target test Different Amount 2013-04-29 13.28 13.28 True 1.776357e-15 2013-04-30 2.39 2.39 True 1.465494e-14 2013-05-01 0.89 0.89 True -1.365574e-14 2013-05-02 8.61 8.61 True 0.000000e+00

more explanation

numpy floating point

Problem with equality with very small differences. Have ben finding this problem when comparing dataframes of timeseries of the output of indicators. It reports that 13.84 != 13.84. When the difference is taken it is x*10^-15 ... so very small diferences. Documented in github: https://stackoverflow.com/questions/33549193/pandas-dataframe-comparison-and-floating-point-precision https://stackoverflow.com/questions/5595425/what-is-the-best-way-to-compare-floats-for-almost-equality-in-python

PEP485: https://docs.python.org/3/whatsnew/3.5.html#pep-485-a-function-for-testing-approximate-equality https://www.python.org/dev/peps/pep-0485/

Documentation: Python: https://docs.python.org/3/library/math.html Numpy: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.isclose.html

`

mikenew commented 7 years ago

Some more clarity

How python works with FP numbers

Python built on C. FP numbers go from python floating point to C as a fractoin and then back to Python as floating point. These conversions are not always accurate.

Numbers like 1.1 and 2.2 do not have exact representations in binary floating point. End users typically would not expect 1.1 + 2.2 to display as 3.3000000000000003 as it does with binary floating point.

Solution - Decimal

https://www.youtube.com/watch?v=JOGPAduCC7c e.g. 7.6+7.9 = 15.499999999234523 not 15.5

a = decimal.decimal(7.6)+decimal.decimal(7.9) a.quantize("0.00") a.quantize(decimal.Decimal("0.00"), rounding=decimal.ROUND_UP)

mikenew commented 7 years ago

Testing ... with straight dataframes (no manipulation of fp types)

            assert_series_equal (   p_test_results[p_name],
                                    p_target_results[p_name],
                                    check_exact = False,             # Whether to compare number exactly.
                                    check_less_precise = True,       # False = 5 digits, Ture = 3 digits
                                    obj = p_name
                               )

Indicator.............Assert ...... Diff Moving Average ... Pass ...... not run Crosses ................ Pass ...... not run SuperTrend .......... Fail ...... Pass

Testing ... with .round(8) and (2) in the assert_series in UnitTestSkyzeAbstract

            assert_series_equal (   p_test_results[p_name].round(8)
                                    p_target_results[p_name].round(8),
                                    check_exact = False,             # Whether to compare number exactly.
                                    check_less_precise = True,       # False = 5 digits, Ture = 3 digits
                                    obj = p_name
                               )

Indicator.............Assert ...... Diff Moving Average ... Pass ...... not run Crosses ................ Pass ...... not run SuperTrend .......... Fail ...... Pass