numpy / numpy-financial

Standalone package of the NumPy financial functions
BSD 3-Clause "New" or "Revised" License
338 stars 80 forks source link

IRR chooses solution closest to zero not always correct choice #39

Closed geoffwright240 closed 2 years ago

geoffwright240 commented 3 years ago

For the following set of cash flows, I would expect the IRR to return 12%.

cf = np.array([-217500.0, -217500.0, 108466.80462450592, 101129.96439328062, 93793.12416205535, 86456.28393083003, 79119.44369960476, 71782.60346837944, 64445.76323715414, 57108.92300592884, 49772.08277470355, 42435.24254347826, 35098.40231225296, 27761.56208102766, 20424.721849802358, 13087.88161857707, 5751.041387351768, -1585.7988438735192, -8922.639075098821, -16259.479306324123, -23596.31953754941, -30933.159768774713, -38270.0, -45606.8402312253, -52943.680462450604, -60280.520693675906, -67617.36092490121])

However, from the code, we have the following choice being made: ` # NPV(rate) = 0 can have more than one solution so we return

only the solution closest to zero.`

`rate = 1/res - 1`
`print(rate)`

[-0.01809679 0.12 ]

As you can see, 12% is one of the solutions but it is discarded. This leads me to question whether choosing the solution that is closest to zero is indeed the right choice here and would appreciate the group's feedback.

geoffwright240 commented 3 years ago

I see that PR https://github.com/numpy/numpy-financial/pull/32 will solve this issue.

geoffwright240 commented 3 years ago

I came across an interesting paper that discusses the problem of which root to choose when evaluating an IRR equation.

http://www.utstat.utoronto.ca/~alexander/irr-soln.pdf

The authors argue that choosing the largest root is generally the correct 'interpretation' for what most users are expecting from an IRR equation. While I fully acknowledge that all roots are correct solutions to the mathematical problem, I believe it probably makes sense to adopt the Law of the Largest Root when no guess is given.

The current code simply chooses the root that is closest to zero. This is not incorrect mathematically but it appears arbitrary compared to the approach spelled out in the attached reference. (Does anyone know the reasoning underlying the current root selection choice?) irr-soln (1).pdf

Kai-Striega commented 3 years ago

@geoffwright240 I'm going to be doing some maintenance work on numpy-financial and I think this will be a good fix. Thankyou for contributing that paper, I agree that it should be the smallest positive root. Incidentally, this seems to have been the original intention of the code too:

https://github.com/numpy/numpy-financial/blob/e624d268dcf00fffd05580726819ade0edc3e4ff/numpy_financial/_financial.py#L770-L771

I don't agree with taking an initial guess as the result. I'm going to get a bit of maintenance stuff done and then I'll get back to this.

Kai-Striega commented 3 years ago

closed by #43

geoffwright240 commented 3 years ago

Thanks Kai!

Kai-Striega commented 3 years ago

@geoffwright240 no problem. It may take a while for there to be a formal release but, if you need it, you can always use the master build that will include the change from now.

jamiecook commented 3 years ago

Hey @Kai-Striega - I was just looking at the other open PR #32 which solves this via a guess. You say above that you don't agree with taking an initial guess - I've tried to read through the above paper but don't really have enough domain knowledge to say if it supports guessing or not. Reading @geoffwright240 comment - it seems his read was that when initial information (a guess) is available, it makes sense to use it.

I ask as I'd be happy to deal with the conflicts on #32 if you were open to merging it. If there is a reason that guessing isn't a good idea then no point and that PR should be closed.

One argument in favour of having it is that without it its impossible to match Excel behaviour - not saying that Excel has the right solution here, but it's an existing precendent. In my case I'm porting Excel -> Python and want to be able to do testing that lets me be 100% certain that I've got the process translated correctly - except sometimes the IRR functions give diff answers and I physically can't get them to agree (without changing my refernce data).

Kai-Striega commented 2 years ago

You say above that you don't agree with taking an initial guess - I've tried to read through the above paper but don't really have enough domain knowledge to say if it supports guessing or not.

I decided against supporting a guess option with simplicity in mind. Perhaps this is premature. At least it seems to be worth discussing further.

I ask as I'd be happy to deal with the conflicts on #32 if you were open to merging it. If there is a reason that guessing isn't a good idea then no point and that PR should be closed.

The issue with the PR wasn't that I disagree with guessing - it's that I'm not sure if it's the best implementation of a guess, or what a "good" guess implementation would look like. So when I started contributing again, I decided to leave it alone. That probably wasn't the best decision. I'll take another look this week.

One argument in favour of having it is that without it its impossible to match Excel behaviour - not saying that Excel has the right solution here, but it's an existing precendent. In my case I'm porting Excel -> Python and want to be able to do testing that lets me be 100% certain that I've got the process translated correctly - except sometimes the IRR functions give diff answers and I physically can't get them to agree (without changing my refernce data).

That raises a couple of question; without knowing how Excel implements guess how is it possible to write a guess implementation that returns exactly the same result as Excel? Even if a "good" guess implementation is developed how do we make sure it's the same as Excel? If we find edge cases where it doesn't fit can we change the behavior later (and possibly break other user's code)?

As an aside do you know if LibreOffice Calc/Google Sheets/Excel all behave the same way?

jamiecook commented 2 years ago

From the Microsoft page there is sparse detail - they start with the guess and iterate - what ever that means.

image

LibreOffice documentation seems to suggest that it mirrors Excel (down to the same assumed initial guess = 0.1 / 10%). I have found the source - but I will admit to not understanding anything about how LibreOffice works - it's in a cpp file, but the code is all in strings that seems to be evaluated by a different interpreter that is invoked at runtime

https://github.com/LibreOffice/core/blob/bfc1600c6ade6f006eb774bffe7caa9c948e8603/sc/source/core/opencl/op_financial.cxx

I think the current solution isn't actually iterative is it? It assumes that the solution can be calculated in closed form from the roots - and selecting one is essentially a crap shoot. From reading it appears that there will be as many solutions as there are sign changes in the input sequence so for complicated forecasts it might be quite a large range of solutions.

So I think you are right, picking the solution that is closest to the given guess is not necessarily going to give a solution that matches Excel 100% of the time. Short of reimplementing to use the iterative approach though, I think it's probably the best idea (that I can think of).

Kai-Striega commented 2 years ago

Thank you for taking the time to research this.

From the Microsoft page there is sparse detail - they start with the guess and iterate - what ever that means.

That means they are using some kind of numerical algorithm that takes steps towards the correct solution. If you're interested in reading up on them it would probably be something similar to the Newton-Raphson method if you'd like to read up on it.

LibreOffice documentation seems to suggest that it mirrors Excel (down to the same assumed initial guess = 0.1 / 10%).

They do look similar, what I'm worried about is the behavior that's not documented (e.g. what is the stepsize for iterations)

I think the current solution isn't actually iterative is it? It assumes that the solution can be calculated in closed form from the roots - and selecting one is essentially a crap shoot.

IRR finds the eigenvalues of the Companion matrix, which correspond to the roots of the polynomial equation. This returns all the roots (which is redundant) and slow (see #40) and leaves ambiguity in how to select the correct root.

So I think you are right, picking the solution that is closest to the given guess is not necessarily going to give a solution that matches Excel 100% of the time. Short of reimplementing to use the iterative approach though, I think it's probably the best idea (that I can think of).

I'm starting to think that reimplementing IRR with an iterative approach would be the best way of doing it. Would you be interested in doing it? If you are, here is a good paper to get started:

jamiecook commented 2 years ago

@Kai-Striega - I would be quite interested in this actually. I just haven't had a lot of time recently - but could pick it up now. I'll look into doing up an initial impl for discussion.

jamiecook commented 2 years ago

Oh - you've already picked it up in #47