Closed aa9ad435-3d3c-43f1-9b49-28cce5b14046 closed 4 years ago
Description changed:
---
+++
@@ -4,6 +4,11 @@
I have translated it into Python for my own use, also some time ago. Now I have decided to contribute it to Sage if possible. I have rewritten the source completely following Sage guidelines as best I could, also found some bugs, and ran a number of examples against both the Sage VectorPartitions() implementation and the original Haskell code by Yorgey. As far as I can see it works as it should, usual disclaimers apply.
+I will commit as soon as I can. Any comments are welcome.
+
+Cheers,
+
+Denis
Examples:
@@ -32,8 +37,3 @@ True
-I will commit as soon as I can.
-
-Cheers,
-
-Denis
Description changed:
---
+++
@@ -9,6 +9,7 @@
Cheers,
Denis
+
Examples:
Description changed:
---
+++
@@ -2,7 +2,7 @@
Some time ago Brent Yorgey published a very efficient algorithm for vector partitions in Haskell, see Brent Yorgey, Generating Multiset Partitions, The Monad Reader, Issue 8, September 2007, p. 5. at https://wiki.haskell.org/The_Monad.Reader/Previous_issues.
-I have translated it into Python for my own use, also some time ago. Now I have decided to contribute it to Sage if possible. I have rewritten the source completely following Sage guidelines as best I could, also found some bugs, and ran a number of examples against both the Sage VectorPartitions() implementation and the original Haskell code by Yorgey. As far as I can see it works as it should, usual disclaimers apply.
+I have translated it into Python for my own use, also some time ago. Now I have decided to contribute it to Sage if possible. I have rewritten the source completely following Sage guidelines as best I could, got rid of some bugs (mine, not Yorgey's), and ran a number of examples against both the Sage VectorPartitions() implementation and the original Haskell code published by Yorgey. As far as I can see it works as it should, usual disclaimers apply.
I will commit as soon as I can. Any comments are welcome.
Commit: aa2bfb0
Branch pushed to git repo; I updated commit sha1. New commits:
aa2bfb0 | New Python implementation of Brent Yorgey's fast vector partition algorithm |
The code works as advertised, but for some reason I cannot get my local version of Sage to see it. No matter how much I run ./sage -br and make, it keeps complaining that the name "fast_vector_partitions" is undefined. I can see that something has happened in the build, because the file fast_vector_partitions.py has migrated from ./src/sage/combinat where I put it to the other locations, and a .pyc file has appeared:
$ find . -name "*fast_vector_partitions.*"
./src/build/lib.linux-x86_64-3.7/sage/combinat/fast_vector_partitions.py
./src/sage/combinat/fast_vector_partitions.py
./local/lib/python3.7/site-packages/sage/combinat/fast_vector_partitions.py
./local/lib/python3.7/site-packages/sage/combinat/__pycache__/fast_vector_partitions.cpython-37.pyc
However, no luck with the interpreter. I hope this can be sorted out quickly.
Please help,
Denis
(1) You need to add
sage: from sage.combinat.fast_vector_partitions import fast_vector_partitions
as the first line of your doctests.
(2) every function in your file must have doctests
(3) Besides, your documentation is not formatted correctly. Please read
https://doc.sagemath.org/html/en/developer/coding_basics.html#documentation-strings
carefully and fix your file according to our documentation style.
Thank you, everything works now.
Here comes the next attempt to get past the patchbots.
Denis
Branch pushed to git repo; I updated commit sha1. New commits:
9b2b7ed | Bugfix iteration |
(0) at the top of the file, you do not need to indent inside ALGORITHM: and AUTHORS:
(1) EXAMPLES:
should be EXAMPLES::
and then the doctest inside the EXAMPLES block should be indented by 4 spaces
(general rule is that something ending with :: should contain indented text)
(2) .. NOTE::
and .. WARNING::
should be at the same indentation as the text before
(3) remove the "uncomment if..." lines
your dickson_le
can be written in one line
all(x <= y for x, y in zip(v, w))
similarly, vector_minus
is
[x - y for x, y in zip(v,w)]
Thank you very much, this was helpful. It is difficult to get the fine points right.
With your one-liner, I don't need to import the operator module any more.
Here comes the hopefully-last commit before review.
Branch pushed to git repo; I updated commit sha1. New commits:
60063f7 | 2nd bugfix iteration |
It is going to take more rounds, I think. I hope you will not be discouraged.
you should just remove dickson_le
which is used only once (replace its only use by the one-line code)
and the same for vector_minus
same for vector_clip, which is [min(x,y) for x,y in zip(v, w)]
same for vector_unit, also used only once
also remove all the for-debugging-only code.
ok here comes
last before I turn in
not discouraged
poor patchbots
Branch pushed to git repo; I updated commit sha1. New commits:
c49a8c1 | 3rd bugfix iteration |
(0) INPUT: and OUTPUT: should be followed by an empty line (as always for lines ending with : or ::)
(1) the content of NOTE:: and WARNING:: should be indented by 4
(2) Do not use # inside the documentation, you can write
+ EXAMPLES:
+
+ The older the computer, the more impressive the comparison::
+
+ sage: from sage.combinat.fast_vector_partitions import fast_vector_partitions
(3) I would suggest to make your procedure return an iterator (using "yield")
(4) you should insert your new file in the documentation (for example by adding a line in src/doc/en/reference/combinat/module_list.rst
. And then compile the doc (make doc) and look at the generated html file.
Is there some easy way to flip the ordering and have it iterate starting with the largest in lex? Although I guess in many ways it is nice to have another algorithm that iterates starting from the lex smallest...
Typo in "decompostion"
In vector_halve, you could use
for i, vv in enumerate(v):
to take care of the indexing variable i
Documentation builds correctly, so I am ready for the next iteration. Thanks again for all the help, I don't think I could have done without it.
I have decided against enumerate() because I think it is better to be explicit with i += 1 for such a small snippet.
I cannot do yield, sorry. My time for real-time work on this is almost up, so what's done this weekend is done, as far as I am concerned, until June at least. Having said that, I would be happy to play with somebody else's contribution. The author list is open, after all.
From a more general point of view, I think that the time may come when this code can be merged with the VectorPartitions() code, which has some very classy (ha!) features, e.g. vparts = VectorPartitions(...) returns an iterator which can be subscripted: vparts[0], vparts[1], and then reused, vparts[0] again. I don't know how to do that. However, at this time, I think it would be prudent to release this code in isolation for some reasonable period first.
I don't think flipping the order is at all easy with this algorithm, but feel free to look up Yorgey's paper and contradict me. As far as I can see, his whole idea is to produce a partition on each and every cycle of the recursion, so it is natural to start with the one sure thing, v=v. Splitting and re-splitting it one gets them all, but obviously the natural order here is downwards.
There is a deep reason why "sniffing beyond the edges" is disastrously inefficient in vector spaces. Say we have a sphere of radius R in d-dimensional space, and another with a smaller radius r=R-D inside it. How much volume is there in the surface shell of thickness D? Evidently it is
~ Rd - (R-D)d = Rd [1 - (1-D/R)d] --> Rd (!!!) when d>>1, because (1-D/R)<1.
In other words, the shell of thickness D=1 (integer point case) dominates the volume of the sphere, no matter how big R is, when the dimension d of the space is large. Any inefficiency, no matter how marginal along a direction, D/R << 1, will end up dominating the calculation.
(deep breath) here comes the push...
Branch pushed to git repo; I updated commit sha1. New commits:
6917612 | 1st build with documentation |
Some patchbots are failing because of reasons unrelated to my code, at least it seems so to me.
The failed reports are the same as the green ones where they refer to fast_vector_partitions.py.
In these reports doctests are timing out on unrelated packages, but all checkmarks are green.
This being the first time I am contributing, I don't know what to expect now.
Please advise,
Denis
yes, some patchbots are known to be broken. If one patchbot is green and the other fails for obviously unrelated reasons, then that's good enough.
This ticket is starting to look very good, but still is not perfect. If you stop working on it, the risk is that nobody else will care, and you will find the ticket in the same state when you come back in 6 months.
(0) the lines in the INPUT: and OUTPUT: blocks should not have final dots.
(1) making iterators would be really better.
You don't have to tell me, look at ticket #27139 - 13 months for one line.
(0) no problem
(1) the problem is, I really don't know how to do yield. I've tried a few times and only got grief. I understand it may be better to defer evaluation, especially with combinatorially exploding outputs, but unless it can be done quickly (read: you tell me how) I am afraid it will go to the back burner starting Monday. It's not that I don't want to do it, it's just how it is with my time.
I did try to replace return with yield just to see what will happen, and I just get errors. It is recursive twice over, so I am really reluctant to go up yield creek with no paddle, as it were.
Any pointer appreciated. I will do the dotless push a little later, have to run now.
I propose to make myself a review commit, trying to convert to iterator, if you agree.
Please do, I look forward to learning a bit about yield, only in that case you must put yourself on the authors' list, I insist.
Does that mean I do the dotless iteration now, or wait?
as you want, tell me when I can
I will do both the "dot removal" and the "yield conversion"
Changed branch from u/gh-denissunko/fast_vector_partition_algorithm to public/ticket/29221
Here is the new branch, with iterators.
Beware that this is now on top of the latest develop release of sage, 9.1.beta5. If you pull the branch, it may trigger a large scale recompilation of sage, if you were using 9.0 for example.
I have changed the doctests to much smaller cases. Our doctests are not here to display the power of the methods, but should be short and quick cases to check that the code works. Each doctests is not supposed to last more than 1s.
@tscrim, do you see something else to be done ?
New commits:
2126552 | Merge branch 'u/gh-denissunko/fast_vector_partition_algorithm' in 9.1.b5 |
a387f17 | trac 29221 conversion to iterators and shorter doctests and doc tweaks |
I have looked at the code and it is beautiful, thank you so much.
I can't do anything with it right now because I want to pull/recompile first as you said, with my more-typewriter-than-computer it will take some time I suppose. Then I will put your name in the authors' list ("conversion to iterators and shorter doctests and doc tweaks") if you won't.
Cheers,
Denis
This was quicker than I thought.
Yes, it works (fantastic).
Here comes the push.
Branch pushed to git repo; I updated commit sha1. New commits:
4b2f4ec | Updated author list, fixed module ordering in .rst file |
Branch pushed to git repo; I updated commit sha1. New commits:
ca5e2a2 | added coding line for non-ascii characters |
Branch pushed to git repo; I updated commit sha1. New commits:
1b0ecc7 | fixed indentation bug in docstring |
It is better to do enumerate
that the i += 1
as it is both faster and more explicit in linking i
to the position of vv
.
Also, I think I will also push a reviewer change. What would you think if I port this over to Cython? Since this is suppose to be fast and it seems well-poised to be done at a lower level, I feel like that is more natural.
I agree about enumerate, here is the push. The [6,6,6] case is 5-10% faster, amazing. Also Cython is a great idea.
Sorry about the docstring, I did not know I could mess up a single line in so many ways. Still not sure the utf-8 line will work as advertised.
Branch pushed to git repo; I updated commit sha1. New commits:
88d24f9 | changed to enumerate() in vector_halve() |
Branch pushed to git repo; I updated commit sha1. New commits:
728cc18 | trac 29221 switching to cython (just changed to pyx file) |
New Python implementation of Brent Yorgey's fast vector partition algorithm
Some time ago Brent Yorgey published a very efficient algorithm for vector partitions in Haskell, see Brent Yorgey, Generating Multiset Partitions, The Monad Reader, Issue 8, September 2007, p. 5. at https://wiki.haskell.org/The_Monad.Reader/Previous_issues.
I have translated it into Python for my own use, also some time ago. Now I have decided to contribute it to Sage if possible. I have rewritten the source completely following Sage guidelines as best I could, got rid of some bugs (mine, not Yorgey's), and ran a number of examples against both the Sage VectorPartitions() implementation and the original Haskell code published by Yorgey. As far as I can see it works as it should, usual disclaimers apply.
I will commit as soon as I can. Any comments are welcome.
Cheers,
Denis
Examples:
CC: @tscrim
Component: combinatorics
Keywords: vector partitions
Author: Denis Sunko
Branch/Commit:
ebfa5e5
Reviewer: Frédéric Chapoton, Travis Scrimshaw
Issue created by migration from https://trac.sagemath.org/ticket/29221