quantopian / zipline

Zipline, a Pythonic Algorithmic Trading Library
https://www.zipline.io
Apache License 2.0
17.56k stars 4.71k forks source link

Change number representations from float to decimal #56

Closed twiecki closed 11 years ago

twiecki commented 11 years ago

We are currently using python floats for e.g. calculating commission costs which represents a problem for some strategies since python uses machine representation (e.g. 1.1 * 1.1 != 1.21 since python returns 1.2100000000000002). So in most financial applications floats are represented as ints or as an object of some class (Decimal or cDecimal in python).

As such, I think we should replace float computations to Decimal where possible. Outstanding concern are performance regressions.

mcobzarenco commented 11 years ago

Were I involved in the project, I would be strongly opposed to the idea from a performance point of view. The loss in accuracy is completely immaterial as the inherent noise always present in financial data is many, many, many orders of magnitude higher. The only places where I've encountered people using decimal arithmetic in the financial industry is in compliance & risk management/reporting where for legal reasons they cannot use IEEE 754.

But I am not involved in the project, although I do follow it with great enthusiasm.

twiecki commented 11 years ago

Thanks @mcobzarenco for the input. I think that's a valid point. I can't really foresee the performance impact but if it's significant enough I agree that performance probably trumps floating point errors.

For reference, this came up on our zipline mailing list https://groups.google.com/forum/#!topic/zipline/vDIPWPwE_fQ

Elektra58 commented 11 years ago

I tend to agree that performance is more important than the n-th decimal. Where it matters is in the unit tests when a different library is used to calculate basically the same number. It was good to see that the original authors already thought of that!

Where I put more emphasis on currently (while working on refactoring the rolling stdev transform) is the algorithm itself. This is essential as soon as people use stdev on volume or other numbers that are either very large or very small. First I will deliver the current naïve algorithm, later I might provide a more numerically sound solution for the second and higher statistical moments.

mcobzarenco commented 11 years ago

That is a usual problem with testing scientific computations, not only when there's a different library involved, but a slightly different way of calculating the same number suffices (i.e. reorder some additions). However testing for "almost" equality solves it without too many problems (i.e. assertAlmostEqual(x, y)http://docs.python.org/library/unittest.html#unittest.TestCase.assertAlmostEqual in python's stdlib). One can make it rigorous by first working out an upper bound on the lost precision.

On 10 January 2013 17:27, FP notifications@github.com wrote:

in the unit tests when a different library is used to calculate basically the same number. It was good to see that the original authors already thought of that!

Where I put more emphasis on currently (while working on refactoring the rolling stdev transform) is the algorithm itself. This is essential as soon as people use stdev on volume or other numbers that are either very large or very small. First I will deliver the current naïve algorithm, later I might provide a more numerically sound solution for the second and higher statistical moments.

twiecki commented 11 years ago

OK, based on the input I will close this for now. Thanks for the feedback everyone.

@mcobzarenco Yeah, the test_risk unittests need to be refactored to use the numpy testing you suggest.