sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.41k stars 474 forks source link

fast vector partition algorithm #29221

Closed aa9ad435-3d3c-43f1-9b49-28cce5b14046 closed 4 years ago

aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago

New Python implementation of Brent Yorgey's fast vector partition algorithm

Some time ago Brent Yorgey published a very efficient algorithm for vector partitions in Haskell, see Brent Yorgey, Generating Multiset Partitions, The Monad Reader, Issue 8, September 2007, p. 5. at https://wiki.haskell.org/The_Monad.Reader/Previous_issues.

I have translated it into Python for my own use, also some time ago. Now I have decided to contribute it to Sage if possible. I have rewritten the source completely following Sage guidelines as best I could, got rid of some bugs (mine, not Yorgey's), and ran a number of examples against both the Sage VectorPartitions() implementation and the original Haskell code published by Yorgey. As far as I can see it works as it should, usual disclaimers apply.

I will commit as soon as I can. Any comments are welcome.

Cheers,

Denis

Examples:

            # the older the computer, the more impressive the example:
            sage: %time fastvparts = fast_vector_partitions([6,6,6])
            CPU times: user 17 s, sys: 136 ms, total: 17.1 s
            Wall time: 17.1 s
            sage: %time vparts = list(VectorPartitions([6,6,6]))
            CPU times: user 4min 16s, sys: 319 ms, total: 4min 16s
            Wall time: 4min 16s
            sage: vparts == fastvparts[::-1]
            True
            sage: fast_vector_partitions([1,2,3], min = [0,1,1])
            [[[1, 2, 3]],
             [[0, 2, 3], [1, 0, 0]],
             [[0, 2, 2], [1, 0, 1]],
             [[0, 2, 1], [1, 0, 2]],
             [[0, 2, 0], [1, 0, 3]],
             [[0, 1, 3], [1, 1, 0]],
             [[0, 1, 2], [1, 1, 1]],
             [[0, 1, 1], [1, 1, 2]],
             [[0, 1, 1], [0, 1, 2], [1, 0, 0]],
             [[0, 1, 1], [0, 1, 1], [1, 0, 1]]]
            sage: fast_vector_partitions([5,7,6], min = [1,3,2])==
            ....: list(VectorPartitions([5,7,6], min = [1,3,2]))[::-1]
            True

CC: @tscrim

Component: combinatorics

Keywords: vector partitions

Author: Denis Sunko

Branch/Commit: ebfa5e5

Reviewer: Frédéric Chapoton, Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/29221

aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago

Description changed:

--- 
+++ 
@@ -4,6 +4,11 @@

 I have translated it into Python for my own use, also some time ago. Now I have decided to contribute it to Sage if possible. I have rewritten the source completely following Sage guidelines as best I could, also found some bugs, and ran a number of examples against both the Sage VectorPartitions() implementation and the original Haskell code by Yorgey. As far as I can see it works as it should, usual disclaimers apply.

+I will commit as soon as I can. Any comments are welcome.
+
+Cheers,
+
+Denis
 Examples:

@@ -32,8 +37,3 @@ True


-I will commit as soon as I can.
-
-Cheers,
-
-Denis
aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago

Description changed:

--- 
+++ 
@@ -9,6 +9,7 @@
 Cheers,

 Denis
+
 Examples:
aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago

Description changed:

--- 
+++ 
@@ -2,7 +2,7 @@

 Some time ago Brent Yorgey published a very efficient algorithm for vector partitions in Haskell, see Brent Yorgey, Generating Multiset Partitions,        The Monad Reader, Issue 8, September 2007, p. 5. at https://wiki.haskell.org/The_Monad.Reader/Previous_issues.

-I have translated it into Python for my own use, also some time ago. Now I have decided to contribute it to Sage if possible. I have rewritten the source completely following Sage guidelines as best I could, also found some bugs, and ran a number of examples against both the Sage VectorPartitions() implementation and the original Haskell code by Yorgey. As far as I can see it works as it should, usual disclaimers apply.
+I have translated it into Python for my own use, also some time ago. Now I have decided to contribute it to Sage if possible. I have rewritten the source completely following Sage guidelines as best I could, got rid of some bugs (mine, not Yorgey's), and ran a number of examples against both the Sage VectorPartitions() implementation and the original Haskell code published by Yorgey. As far as I can see it works as it should, usual disclaimers apply.

 I will commit as soon as I can. Any comments are welcome.
aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago

Branch: u/gh-denissunko/fast_vector_partition_algorithm

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Commit: aa2bfb0

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

aa2bfb0New Python implementation of Brent Yorgey's fast vector partition algorithm
aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago
comment:7

The code works as advertised, but for some reason I cannot get my local version of Sage to see it. No matter how much I run ./sage -br and make, it keeps complaining that the name "fast_vector_partitions" is undefined. I can see that something has happened in the build, because the file fast_vector_partitions.py has migrated from ./src/sage/combinat where I put it to the other locations, and a .pyc file has appeared:

$ find . -name "*fast_vector_partitions.*"
./src/build/lib.linux-x86_64-3.7/sage/combinat/fast_vector_partitions.py
./src/sage/combinat/fast_vector_partitions.py
./local/lib/python3.7/site-packages/sage/combinat/fast_vector_partitions.py
./local/lib/python3.7/site-packages/sage/combinat/__pycache__/fast_vector_partitions.cpython-37.pyc

However, no luck with the interpreter. I hope this can be sorted out quickly.

Please help,

Denis

fchapoton commented 4 years ago
comment:9

(1) You need to add

sage: from sage.combinat.fast_vector_partitions import fast_vector_partitions

as the first line of your doctests.

(2) every function in your file must have doctests

(3) Besides, your documentation is not formatted correctly. Please read

https://doc.sagemath.org/html/en/developer/coding_basics.html#documentation-strings

carefully and fix your file according to our documentation style.

aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago
comment:10

Thank you, everything works now.

Here comes the next attempt to get past the patchbots.

Denis

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Changed commit from aa2bfb0 to 9b2b7ed

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

9b2b7edBugfix iteration
fchapoton commented 4 years ago
comment:12

(0) at the top of the file, you do not need to indent inside ALGORITHM: and AUTHORS:

(1) EXAMPLES: should be EXAMPLES::

and then the doctest inside the EXAMPLES block should be indented by 4 spaces

(general rule is that something ending with :: should contain indented text)

(2) .. NOTE:: and .. WARNING:: should be at the same indentation as the text before

(3) remove the "uncomment if..." lines

fchapoton commented 4 years ago
comment:13

your dickson_le can be written in one line

all(x <= y for x, y in zip(v, w))

similarly, vector_minus is [x - y for x, y in zip(v,w)]

aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago
comment:14

Thank you very much, this was helpful. It is difficult to get the fine points right.

With your one-liner, I don't need to import the operator module any more.

Here comes the hopefully-last commit before review.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

60063f72nd bugfix iteration
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Changed commit from 9b2b7ed to 60063f7

fchapoton commented 4 years ago
comment:16

It is going to take more rounds, I think. I hope you will not be discouraged.

you should just remove dickson_le which is used only once (replace its only use by the one-line code)

and the same for vector_minus

same for vector_clip, which is [min(x,y) for x,y in zip(v, w)]

same for vector_unit, also used only once

also remove all the for-debugging-only code.

aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago
comment:17

ok here comes

last before I turn in

not discouraged

poor patchbots

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

c49a8c13rd bugfix iteration
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Changed commit from 60063f7 to c49a8c1

fchapoton commented 4 years ago
comment:19

(0) INPUT: and OUTPUT: should be followed by an empty line (as always for lines ending with : or ::)

(1) the content of NOTE:: and WARNING:: should be indented by 4

(2) Do not use # inside the documentation, you can write

+    EXAMPLES:
+
+    The older the computer, the more impressive the comparison::
+
+        sage: from sage.combinat.fast_vector_partitions import fast_vector_partitions

(3) I would suggest to make your procedure return an iterator (using "yield")

(4) you should insert your new file in the documentation (for example by adding a line in src/doc/en/reference/combinat/module_list.rst. And then compile the doc (make doc) and look at the generated html file.

tscrim commented 4 years ago
comment:20

Is there some easy way to flip the ordering and have it iterate starting with the largest in lex? Although I guess in many ways it is nice to have another algorithm that iterates starting from the lex smallest...

fchapoton commented 4 years ago
comment:21

Typo in "decompostion"

In vector_halve, you could use

for i, vv in enumerate(v):

to take care of the indexing variable i

aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago
comment:22

Documentation builds correctly, so I am ready for the next iteration. Thanks again for all the help, I don't think I could have done without it.

I have decided against enumerate() because I think it is better to be explicit with i += 1 for such a small snippet.

I cannot do yield, sorry. My time for real-time work on this is almost up, so what's done this weekend is done, as far as I am concerned, until June at least. Having said that, I would be happy to play with somebody else's contribution. The author list is open, after all.

From a more general point of view, I think that the time may come when this code can be merged with the VectorPartitions() code, which has some very classy (ha!) features, e.g. vparts = VectorPartitions(...) returns an iterator which can be subscripted: vparts[0], vparts[1], and then reused, vparts[0] again. I don't know how to do that. However, at this time, I think it would be prudent to release this code in isolation for some reasonable period first.

I don't think flipping the order is at all easy with this algorithm, but feel free to look up Yorgey's paper and contradict me. As far as I can see, his whole idea is to produce a partition on each and every cycle of the recursion, so it is natural to start with the one sure thing, v=v. Splitting and re-splitting it one gets them all, but obviously the natural order here is downwards.

There is a deep reason why "sniffing beyond the edges" is disastrously inefficient in vector spaces. Say we have a sphere of radius R in d-dimensional space, and another with a smaller radius r=R-D inside it. How much volume is there in the surface shell of thickness D? Evidently it is

~ Rd - (R-D)d = Rd [1 - (1-D/R)d] --> Rd (!!!) when d>>1, because (1-D/R)<1.

In other words, the shell of thickness D=1 (integer point case) dominates the volume of the sphere, no matter how big R is, when the dimension d of the space is large. Any inefficiency, no matter how marginal along a direction, D/R << 1, will end up dominating the calculation.

(deep breath) here comes the push...

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

69176121st build with documentation
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Changed commit from c49a8c1 to 6917612

aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago
comment:24

Some patchbots are failing because of reasons unrelated to my code, at least it seems so to me.

The failed reports are the same as the green ones where they refer to fast_vector_partitions.py.

In these reports doctests are timing out on unrelated packages, but all checkmarks are green.

This being the first time I am contributing, I don't know what to expect now.

Please advise,

Denis

fchapoton commented 4 years ago
comment:25

yes, some patchbots are known to be broken. If one patchbot is green and the other fails for obviously unrelated reasons, then that's good enough.

This ticket is starting to look very good, but still is not perfect. If you stop working on it, the risk is that nobody else will care, and you will find the ticket in the same state when you come back in 6 months.

(0) the lines in the INPUT: and OUTPUT: blocks should not have final dots.

(1) making iterators would be really better.

aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago
comment:26

You don't have to tell me, look at ticket #27139 - 13 months for one line.

(0) no problem

(1) the problem is, I really don't know how to do yield. I've tried a few times and only got grief. I understand it may be better to defer evaluation, especially with combinatorially exploding outputs, but unless it can be done quickly (read: you tell me how) I am afraid it will go to the back burner starting Monday. It's not that I don't want to do it, it's just how it is with my time.

I did try to replace return with yield just to see what will happen, and I just get errors. It is recursive twice over, so I am really reluctant to go up yield creek with no paddle, as it were.

Any pointer appreciated. I will do the dotless push a little later, have to run now.

fchapoton commented 4 years ago
comment:27

I propose to make myself a review commit, trying to convert to iterator, if you agree.

aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago
comment:28

Please do, I look forward to learning a bit about yield, only in that case you must put yourself on the authors' list, I insist.

Does that mean I do the dotless iteration now, or wait?

fchapoton commented 4 years ago
comment:29

as you want, tell me when I can

fchapoton commented 4 years ago
comment:30

I will do both the "dot removal" and the "yield conversion"

fchapoton commented 4 years ago

Changed commit from 6917612 to a387f17

fchapoton commented 4 years ago

Changed branch from u/gh-denissunko/fast_vector_partition_algorithm to public/ticket/29221

fchapoton commented 4 years ago
comment:31

Here is the new branch, with iterators.

Beware that this is now on top of the latest develop release of sage, 9.1.beta5. If you pull the branch, it may trigger a large scale recompilation of sage, if you were using 9.0 for example.

I have changed the doctests to much smaller cases. Our doctests are not here to display the power of the methods, but should be short and quick cases to check that the code works. Each doctests is not supposed to last more than 1s.

@tscrim, do you see something else to be done ?


New commits:

2126552Merge branch 'u/gh-denissunko/fast_vector_partition_algorithm' in 9.1.b5
a387f17trac 29221 conversion to iterators and shorter doctests and doc tweaks
aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago
comment:32

I have looked at the code and it is beautiful, thank you so much.

I can't do anything with it right now because I want to pull/recompile first as you said, with my more-typewriter-than-computer it will take some time I suppose. Then I will put your name in the authors' list ("conversion to iterators and shorter doctests and doc tweaks") if you won't.

Cheers,

Denis

aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago
comment:33

This was quicker than I thought.

Yes, it works (fantastic).

Here comes the push.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Changed commit from a387f17 to 4b2f4ec

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

4b2f4ecUpdated author list, fixed module ordering in .rst file
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

ca5e2a2added coding line for non-ascii characters
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Changed commit from 4b2f4ec to ca5e2a2

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Changed commit from ca5e2a2 to 1b0ecc7

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

1b0ecc7fixed indentation bug in docstring
tscrim commented 4 years ago
comment:37

It is better to do enumerate that the i += 1 as it is both faster and more explicit in linking i to the position of vv.

Also, I think I will also push a reviewer change. What would you think if I port this over to Cython? Since this is suppose to be fast and it seems well-poised to be done at a lower level, I feel like that is more natural.

aa9ad435-3d3c-43f1-9b49-28cce5b14046 commented 4 years ago
comment:38

I agree about enumerate, here is the push. The [6,6,6] case is 5-10% faster, amazing. Also Cython is a great idea.

Sorry about the docstring, I did not know I could mess up a single line in so many ways. Still not sure the utf-8 line will work as advertised.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

88d24f9changed to enumerate() in vector_halve()
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Changed commit from 1b0ecc7 to 88d24f9

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

728cc18trac 29221 switching to cython (just changed to pyx file)
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Changed commit from 88d24f9 to 728cc18