Use divide-and-conquer for faster factorials

cfc9f3e0-e33f-4ecd-9ddd-4123842d6c1e commented 14 years ago

BPO	8692
Nosy	@rhettinger, @mdickinson, @abalkin
Files	factorial4.py factorial-no-recursion.patch factorial-test.patch: Improve unit tests for math.factorial factorial-precompute-partials.patch factorial-speed.sh: Script to test factorial speed for several n factorial.py factorial.patch: Divide-and-conquer factorial algorithm factorial2.patch factorial3.patch factorial4.patch unnamed

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = 'https://github.com/mdickinson' closed_at = created_at = labels = ['extension-modules', 'performance'] title = 'Use divide-and-conquer for faster factorials' updated_at = user = 'https://bugs.python.org/stutzbach' ``` bugs.python.org fields: ```python activity = actor = 'mark.dickinson' assignee = 'mark.dickinson' closed = True closed_date = closer = 'mark.dickinson' components = ['Extension Modules'] creation = creator = 'stutzbach' dependencies = [] files = ['17320', '17321', '17326', '17328', '17329', '17343', '17345', '17348', '17351', '17352', '17354'] hgrepos = [] issue_num = 8692 keywords = ['patch', 'needs review'] message_count = 72.0 messages = ['105537', '105548', '105551', '105552', '105553', '105557', '105559', '105561', '105562', '105563', '105564', '105565', '105568', '105579', '105590', '105592', '105593', '105594', '105596', '105597', '105600', '105602', '105603', '105604', '105607', '105610', '105653', '105655', '105657', '105660', '105661', '105663', '105664', '105665', '105666', '105668', '105678', '105679', '105680', '105681', '105682', '105683', '105684', '105689', '105691', '105715', '105729', '105731', '105754', '105756', '105757', '105759', '105760', '105770', '105777', '105782', '105783', '105784', '105797', '105801', '105806', '105807', '105808', '105811', '105812', '105813', '105814', '105815', '105816', '105817', '105818', '105851'] nosy_count = 6.0 nosy_names = ['rhettinger', 'mark.dickinson', 'belopolsky', 'draghuram', 'stutzbach', 'Alexander.Belopolsky'] pr_nums = [] priority = 'normal' resolution = 'accepted' stage = 'patch review' status = 'closed' superseder = None type = 'performance' url = 'https://bugs.python.org/issue8692' versions = ['Python 3.2'] ```

mdickinson commented 14 years ago

Daniel, sorry: I spent so long writing that last message that I didn't read your update until now.

The new patch looks fine; same caveat about the overflow check as before. Let me know when you want me to apply this.

mdickinson commented 14 years ago

One other note: I find the bit numbering in find_last_set_bit peculiar: isn't the least significant bit usually bit 0? (Well, okay some people number the msb 0, but that's just weird. :) I know the ffs and fls functions also start their bit numbering at one, but this seems very unconventional to me.

Perhaps rename find_last_set_bit to bit_length? i.e., it's the *size* of the small bitfield that can contain the given value, rather than the index of the highest bit in that bitfield.

mdickinson commented 14 years ago

it's the *size* of the small bitfield

*smallest

cfc9f3e0-e33f-4ecd-9ddd-4123842d6c1e commented 14 years ago

On Fri, May 14, 2010 at 3:25 PM, Mark Dickinson \report@bugs.python.org\ wrote:

The patch looks good. Just one issue: I think the overflow check for num_operands * last_bit is bogus. It's possible for the product to overflow and still end up being less than num_operands. How about:

You're right; I'm not sure what I was thinking. (actually, I'm pretty sure I was thinking about addition ;) )

if (num_operands \<= BITS_IN_LONG && num_operands * last_m_bit \<= BITS_IN_LONG) ...

instead? If num_operands \<= BITS_IN_LONG then there's no danger of overflow in the multiplication, and if num_operands > BITS_IN_LONG then the second test will certainly fail.

Yes, that looks good.

Marking this as accepted: Daniel, if you like I can apply the overflow check change suggested above and the minor style fixes and apply this; or if you want to go another round and produce a third patch, that's fine too. Let me know.

That would be great. :)

With regards to find_last_set_bit, I understand where you're coming from, but I'm following the convention of ffs/fls and also bits_in_digit() from longobject.c. I'd like to keep them interchangeable, in case they're consolidated at some point in the future. If you want to rename the function, that's fine with me.

FWIW, I think the MSB=0th people are all electrical engineers. If you're serializing down to bits and the MSB is the first (0th) bit off the wire, it makes sense in that context.

(N.B. If you do produce another patch, please name it something different and don't remove the earlier version---removing the earlier patch versions makes life harder for anyone trying to follow this issue.)

Good to know. I'll keep that in mind going forward.

Thanks to both of you for the rigorous review. This was a fun patch to work on. :-)

mdickinson commented 14 years ago

Okay, here's what I'm planning to commit (tomorrow), in case anyone wants to give it one last check. I've added some comments half-explaining the algorithm used (just in case the referenced URL goes out of existence). I made a couple of other minor changes to the algorithm:

(1) in factorial_partial_product, use 'k = (n + num_operands) | 1;' rather than 'k = (n + num_operands - 1) | 1;'. This saves an operation, and means that when a range including an odd number of terms is split then the bottom half gets the extra term, which makes the partial products a teensy bit more balanced, since the bottom half consists of smaller numbers.

(2) in factorial_loop, I set the initial value of 'upper' to 3 rather than 1. This avoids factorial_partial_product ever being called with a start of 1.

Apart from that, no significant changes.

cfc9f3e0-e33f-4ecd-9ddd-4123842d6c1e commented 14 years ago

Looks good.

abalkin commented 14 years ago

Mark, It is always a pleasure to read your algorithm descriptions!

Just a few comments:

It would be nice if a pure python implementation would make it somewhere in the code base. Maybe it can be placed in comments in C file or added to the test module somehow.
Please rename arguments to factorial_partial_product() from n, m to start, stop. It is confusing enough that n has a different meaning loop() and partial_product(), but n, m coming in reverse alphabetical order makes it really hard to follow the code.
Using n as a variable in the fast loop saves one integer assignment (likely to be optimized away) at the expense of clarity. One has to realize that there is an unconditional return after the loop before concluding that n is the same in the recursive part.
"We know that m is the largest number to be multiplied." Shouldn't that be m - 2?
Is find_last_set_bit() the same as bit_length()? If so, why isn't it called the same? I don't want to bikeshed, but seeing find_last_set_bit makes me wonder if "last" means most or least significant bit.
There is a comment line / r *= inner; */ in factorial_loop() but no r variable. Shouldn't that be outer \= inner?

I'll review math_factorial() after reading Daniel's message that just arrived.

abalkin commented 14 years ago

Some more ...

Mark's notes talk about "odd-part-of-factorial". It would be clearer if r variable in math_factorial() was called odd_part.
I would also rename nminusnumbits to ntwos or something else that explains what it is rather than how it is computed.

mdickinson commented 14 years ago

New patch (factorial3.patch) addressing all of Alexander's points except the one about including Python source somewhere.

I also expanded the lookup table to 20 entries on LP64 systems.

mdickinson commented 14 years ago

And the same patch, but with a (deliberately simple) pure Python version of the algorithm in test_math.py.

abalkin commented 14 years ago

On Sat, May 15, 2010 at 6:32 AM, Mark Dickinson \report@bugs.python.org\ wrote:

New patch (factorial3.patch) addressing all of Alexander's points except the one about including Python source somewhere.

Thanks for making the changes. I think we converged to a really neat implementation.

I also expanded the lookup table to 20 entries on LP64 systems.

It's a matter of taste, but I was taught that C allows trailing commas in initializers specifically for the cases like this to avoid using a leading comma.

mdickinson commented 14 years ago

It's a matter of taste, but I was taught that C allows trailing commas in initializers specifically for the cases like this to avoid using a leading comma.

Unfortunately, I think C89 doesn't allow for this, and there are tracker issues about Python compiles failing on some platforms (AIX?) thanks to extra trailing commas in the source. I prefer the extra comma too, but it's not legal C89.

mdickinson commented 14 years ago

Ah; Alexander's right: I was misremembering. An extra comma in an *enum* list isn't allowed (cf. bpo-5889); an extra comma in an array initializer is. I'll rewrite that bit.

5579dc13-9f48-42d1-bb17-9c003ef6fa70 commented 14 years ago

There is one place in the notes still referring to
factorial_part_product.

cfc9f3e0-e33f-4ecd-9ddd-4123842d6c1e commented 14 years ago

The comment for bit_length is missing a space or two: "Objects/longobject.c.Someday"

mdickinson commented 14 years ago

There is one place in the notes still referring to factorial_part_product.

Hmm. I can't find it. Can you be more specific?

I'll fix the spaces before 'Someday'.

5579dc13-9f48-42d1-bb17-9c003ef6fa70 commented 14 years ago

Sorry for terseness. Sending it from my phone.

The line was in factorial4.patch:

+ * The factorial_partial_product function computes the product of all
odd j in

>

Hmm. I can't find it. Can you be more specific?

mdickinson commented 14 years ago

Okay, thanks. I'm still not seeing what's wrong with this, though (sorry for being slow :( )

5579dc13-9f48-42d1-bb17-9c003ef6fa70 commented 14 years ago

s/partial_product/odd_part/

It looks like you made this change in some places but not all.

On May 15, 2010, at 11:16 AM, Mark Dickinson \report@bugs.python.org\
wrote:

Mark Dickinson \dickinsm@gmail.com\ added the comment:

Okay, thanks. I'm still not seeing what's wrong with this, though
(sorry for being slow :( )

----------

Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue8692\

mdickinson commented 14 years ago

But factorial_partial_product and factorial_odd_part both exist: the former is just computing the product of all odd integers in the given interval, while the latter computes the odd part of factorial(n).

I've double checked the comments and they still look okay to me. That particular reference really is to factorial_partial_product.

5579dc13-9f48-42d1-bb17-9c003ef6fa70 commented 14 years ago

Sorry I didn't realize that ...

mdickinson commented 14 years ago

Committed in r81196. Thanks, everyone!

python / cpython

Use divide-and-conquer for faster factorials #52938