Closed simon-king-jena closed 6 years ago
Commit: 827f579
Description changed:
---
+++
@@ -2,6 +2,7 @@
sage: n=4;m=11;P = PolynomialRing(QQ,n*m,"x"); x = P.gens(); M = Matrix(n,x)
sage: I = P.ideal(M.minors(2)) sage: J = P*[m.lm() for m in I.groebner_basis()] sage: J.hilbert_numerator() Traceback (most recent call last): @@ -14,7 +15,6 @@
sage: from sage.rings.polynomial.hilbert import first_hilbert_series
The residue class ring of the polynomial ring in 11 x 4 variables modulo the 2x2 minors is a normal monoid ring. You can use Normaliz, but there is also an explicit formula. See MR1213858
Some real world example for the computation of Hilbert series
Attachment: hilbert_example.sobj.gz
Replying to @w-bruns:
The residue class ring of the polynomial ring in 11 x 4 variables modulo the 2x2 minors is a normal monoid ring. You can use Normaliz, but there is also an explicit formula. See MR1213858
On #25091, you said that normaliz can only compute the Hilbert series of the integral closure of a monomial ideal. That may not be what is required here.
Anyway: Clearly I am not interested in that particular example. The example is just that: An example, that I borrowed from #20145. It is suitable as a doc test, as it can be set up in short time; and it is instructive as a doc example, as it exposes limitations of Singular that make Singular unusable in "real world examples" for which there are no explicit formulae in the literature.
Or would you know a formula for the Hilbert series of the mod-2 cohomology ring of group number 299 of order 256 (numbering according to the small groups library)? That cohomology ring is the smallest potential counter example to a conjecture of Dave Benson. A small part of the attempt to verify that conjecture is the computation of the Hilbert series of several quotients of that ring. Unfortunately, that "small part" is too big for Singular or Frobby.
The leading ideal of the relation ideal of the above-mentioned cohomology ring is in attachment: hilbert_example.sobjThe computation of its Hilbert series actually shouldn't take very long:
sage: from sage.rings.polynomial.hilbert import hilbert_poincare_series
sage: I,grading = load("/home/king/Projekte/coho/hilbert_example.sobj")
sage: %time hilbert_poincare_series(I,grading)
CPU times: user 270 ms, sys: 2.85 ms, total: 273 ms
Wall time: 324 ms
(-8*t^26 + 24*t^25 - 16*t^24 - 16*t^23 + 24*t^22 - 8*t^21 - t^7 - t^6 + t^5 - t^4 - t^3 - t^2 + t - 1)/(t^13 - 3*t^12 + 2*t^11 + 2*t^10 - 3*t^9 + t^8 - t^5 + 3*t^4 - 2*t^3 - 2*t^2 + 3*t - 1)
However, Singular immediately tells that the example is too big, and Frobby needs to be killed after several minutes. There is no interface to CoCoA, and I don't know how to utilise the interface to normaliz. I also don't know how to install Macaulay2, so, I can't test that either.
All of these comments are about sum_from_list
:
Something that should give you better speed would be having it take another argument of the starting point to do the sum. That way you would not have to create all of the intermediate lists (e.g., L[:l2]
and L[:l2]
). It will be slightly trickier to code, but not by much and I would imagine give a decent speedup.
I think this might also give you better C code (and hence, be faster):
if l == 1:
return <ETuple> L[0]
if l == 2:
return (<ETuple> L[0]).eadd(<ETuple> L[1])
Some other thoughts:
Does L
need to be a list (i.e., not an array of ETuple
's)? This would mean less type-casting. It also seems like you are using list manipulations other than one sort (which you can most likely just use C qsort
on).
It is faster to do if not Id:
than if len(Id)==0:
, even in Cython.
If you replace Id[-1]
with Id[len(Id)-1]
, you can then remove the bounds check and wrap around from HilbertBaseCase
. You probably should remove those checks across the file to get some extra speed.
You are probably going to be better in terms of speed to directly the low-level polynomial manipulation via flint, that way you have fewer intermediate (Sage) objects and indirection.
Should those ETuple
functions be a part of the ETuple
class?
The for i from 0 <= i < k by m
is discouraged (if not semi-deprecated) in favor of the more standard Python for i in range(0, k, m)
, which gets turned into the same C code AFAIK.
You can avoid a few additions here:
- for i from 0 <= i < 2*m._nonzero by 2:
- degree += m._data[i+1]
+ for i in range(1, 2*m._nonzero, 2):
+ degree += m._data[i]
You should have two functions quotient_degree
and quotient_degree_weighted
, where the former does not take w
as an input. There is almost no code shared between the two and it makes it easier to understand what is going on where they are called. Same for degree
.
quotient(by_var)
knows its return type is a list
since interred
returns a list
.
I don't see why AN
should be a dict
(and hence the first arguments D
to a number of your functions). It seems more like you are mimicking a struct
with a dict
. Using a struct
will be much faster because it doesn't have to do string hashing and equality checking; it would be a direct C lookup. For those things you are manipulating in subfunction calls, you can pass it by pointer like you would normally in C (I forget if Cython passes everything by copy or reference by default for C objects).
Dear Travis,
first of all: Thank you for your thoughtful comments!
Replying to @tscrim:
Something that should give you better speed would be having it take another argument of the starting point to do the sum. That way you would not have to create all of the intermediate lists (e.g.,
L[:l2]
andL[:l2]
). It will be slightly trickier to code, but not by much and I would imagine give a decent speedup.
Right, I should try that.
I think this might also give you better C code (and hence, be faster):
if l == 1: return <ETuple> L[0] if l == 2: return (<ETuple> L[0]).eadd(<ETuple> L[1])
Right, actually the first "if" is nonsense in my code: I assign to m1, but do not return it. Anyway, here is the resulting C code:
/* "_home_king_Sage_git_sage_src_sage_rings_polynomial_hilbert_pyx_0.pyx":268
* if l==1:
* m1 = L[0]
* return L[0] # <<<<<<<<<<<<<<
* if l==2:
* m1,m2=L
*/
__Pyx_XDECREF(((PyObject *)__pyx_r));
if (unlikely(__pyx_v_L == Py_None)) {
PyErr_SetString(PyExc_TypeError, "'NoneType' object is not subscriptable");
__PYX_ERR(0, 268, __pyx_L1_error)
}
__pyx_t_2 = __Pyx_GetItemInt_List(__pyx_v_L, 0, long, 1, __Pyx_PyInt_From_long, 1, 0, 1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 268, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
if (!(likely(((__pyx_t_2) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_2, __pyx_ptype_4sage_5rings_10polynomial_8polydict_ETuple))))) __PYX_ERR(0, 268, __pyx_L1_error)
__pyx_r = ((struct __pyx_obj_4sage_5rings_10polynomial_8polydict_ETuple *)__pyx_t_2);
__pyx_t_2 = 0;
goto __pyx_L0;
versus
/* "_home_king_Sage_git_sage_src_sage_rings_polynomial_hilbert_pyx_0.pyx":268
* if l==1:
* m1 = L[0]
* return <ETuple>L[0] # <<<<<<<<<<<<<<
* if l==2:
* m1,m2=L
*/
__Pyx_XDECREF(((PyObject *)__pyx_r));
if (unlikely(__pyx_v_L == Py_None)) {
PyErr_SetString(PyExc_TypeError, "'NoneType' object is not subscriptable");
__PYX_ERR(0, 268, __pyx_L1_error)
}
__pyx_t_2 = __Pyx_GetItemInt_List(__pyx_v_L, 0, long, 1, __Pyx_PyInt_From_long, 1, 0, 1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 268, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_INCREF(((PyObject *)((struct __pyx_obj_4sage_5rings_10polynomial_8polydict_ETuple *)__pyx_t_2)));
__pyx_r = ((struct __pyx_obj_4sage_5rings_10polynomial_8polydict_ETuple *)__pyx_t_2);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
goto __pyx_L0;
Is the second one better?
Does
L
need to be a list (i.e., not an array ofETuple
's)? This would mean less type-casting.
Can you give me a pointer on how to use arrays of Cython types?
It also seems like you are using list manipulations other than one sort (which you can most likely just use C
qsort
on).
I also use append and pop. How to do these with arrays?
It is faster to do
if not Id:
thanif len(Id)==0:
, even in Cython.
Cool! I thought that, again, it is equivalent. But the code becomes a lot shorter:
/* "_home_king_Sage_git_sage_src_sage_rings_polynomial_hilbert_pyx_0.pyx":292
* cdef int e
* # First, the easiest cases:
* if len(Id)==0: # <<<<<<<<<<<<<<
* return PR(1)
* cdef ETuple m = Id[-1]
*/
if (unlikely(__pyx_v_Id == Py_None)) {
PyErr_SetString(PyExc_TypeError, "object of type 'NoneType' has no len()");
__PYX_ERR(0, 292, __pyx_L1_error)
}
__pyx_t_2 = PyList_GET_SIZE(__pyx_v_Id); if (unlikely(__pyx_t_2 == ((Py_ssize_t)-1))) __PYX_ERR(0, 292, __pyx_L1_error)
__pyx_t_3 = ((__pyx_t_2 == 0) != 0);
if (__pyx_t_3) {
versus
/* "_home_king_Sage_git_sage_src_sage_rings_polynomial_hilbert_pyx_0.pyx":292
* cdef int e
* # First, the easiest cases:
* if not Id: # <<<<<<<<<<<<<<
* return PR(1)
* cdef ETuple m = Id[-1]
*/
__pyx_t_2 = (__pyx_v_Id != Py_None)&&(PyList_GET_SIZE(__pyx_v_Id) != 0);
__pyx_t_3 = ((!__pyx_t_2) != 0);
if (__pyx_t_3) {
If you replace
Id[-1]
withId[len(Id)-1]
, you can then remove the bounds check and wrap around fromHilbertBaseCase
. You probably should remove those checks across the file to get some extra speed.
I don't understand that. What bounds check are you talking about? Are you saying that Id[-1]
does a bounds check, whereas Id[len(Id)-1]
doesn't?
You are probably going to be better in terms of speed to directly the low-level polynomial manipulation via flint, that way you have fewer intermediate (Sage) objects and indirection.
So, I should allocate the underlying flint C types and manipulate these, and only in the very end create a Sage polynomial? I guess that's related with another point below.
Should those
ETuple
functions be a part of theETuple
class?
Probably. Since ETuple
is designed for monomials anyway, it make sense to have monomial divisibility checks there and not here.
The
for i from 0 <= i < k by m
is discouraged (if not semi-deprecated) in favor of the more standard Pythonfor i in range(0, k, m)
, which gets turned into the same C code AFAIK.
Almost. I get
/* "_home_king_Sage_git_sage_src_sage_rings_polynomial_hilbert_pyx_0.pyx":110
* cdef int exp1
* cdef ETuple result
* for i from 0 <= i < 2*m1._nonzero by 2: # <<<<<<<<<<<<<<
* if m1._data[i] == index:
* result = <ETuple>m1._new()
*/
__pyx_t_1 = (2 * __pyx_v_m1->_nonzero);
for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_1; __pyx_v_i+=2) {
...
}
versus
/* "_home_king_Sage_git_sage_src_sage_rings_polynomial_hilbert_pyx_0.pyx":110
* cdef int exp1
* cdef ETuple result
* for i in range(0,2*m1._nonzero,2): # <<<<<<<<<<<<<<
* if m1._data[i] == index:
* result = <ETuple>m1._new()
*/
__pyx_t_1 = (2 * __pyx_v_m1->_nonzero);
__pyx_t_2 = __pyx_t_1;
for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=2) {
__pyx_v_i = __pyx_t_3;
...
}
So, in the non-deprecated version there is an additional variable assignment. Actually this is why I changed my original code (which did use in range(...)
) into from ... by
.
You can avoid a few additions here:
- for i from 0 <= i < 2*m._nonzero by 2: - degree += m._data[i+1] + for i in range(1, 2*m._nonzero, 2): + degree += m._data[i]
Right, good point!
You should have two functions
quotient_degree
andquotient_degree_weighted
, where the former does not takew
as an input. There is almost no code shared between the two and it makes it easier to understand what is going on where they are called. Same fordegree
.
Really? I thought it was clearer the other way around: In all cases, we have a computation that involves degree weights, and it is the job of the function to find out if the weights are trivial or not.
One thing, though: At some point, one needs to check whether w
is None or not. By having a massive code duplication, I could achieve that this check is only done once, namely in first_hilbert_series
. Currently, it is checked frequently, and the frequency wouldn't be reduced with your suggestion of a small code duplication.
quotient(by_var)
knows its return type is alist
sinceinterred
returns alist
.
Right.
I don't see why
AN
should be adict
(and hence the first argumentsD
to a number of your functions). It seems more like you are mimicking astruct
with adict
. Using astruct
will be much faster because it doesn't have to do string hashing and equality checking; it would be a direct C lookup. For those things you are manipulating in subfunction calls, you can pass it by pointer like you would normally in C (I forget if Cython passes everything by copy or reference by default for C objects).
This is related with the point above: In a struct
, I could also more easily use the underlying C data type of flint polynomials.
But just to be clear: I would need to do the memory management myself, wouldn't I? Basically, it is a binary search tree; is there code in Sage (I am sure there is!) where such data structure is used and from where I can get inspiration?
Just to be sure: If I turn the !ETuple functions into methods, they should probably be cpdef, not cdef, right? As long as all instances of !ETuple in my code are cdefined, a cpdef method wouldn't be less efficient than a cdef method, right?
1) One of the bad aspects of Singular is that it is the opposite of being careful in dealing with classes of objects. What is called the Hilbert series of an ideal I there, is the Hilbert series of R/I where R is the ambient polynomial ring. But I is a graded module and has its own Hilbert series. If one wants the Hilbert series of R/I, it should NOT be called the Hilbert series of I.
2) I don't think one can relate the Hilbert series of a (monomial) ideal and the Hilbert series of its integral closure in a useful way.
Replying to @w-bruns:
1) One of the bad aspects of Singular is that it is the opposite of being careful in dealing with classes of objects. What is called the Hilbert series of an ideal I there, is the Hilbert series of R/I where R is the ambient polynomial ring. But I is a graded module and has its own Hilbert series. If one wants the Hilbert series of R/I, it should NOT be called the Hilbert series of I.
Right. In most cases, I actually use the wording "Hilbert series of I" and "Poincaré series of R/I", which I guess isn't standard, but at least it makes the difference clear. But I think in my comments above I haven't been sufficiently careful with these wordings.
EDIT: Note that Bigatti uses the notation "", which coincides with the Hilbert-Poincaré series of R/I. But that's just a notation, not a name. It seems to me that in practical applications, one most commonly has an ideal I and wants to compute the Hilbert-Poincaré series of R/I, not of "I as a module". Hence, from a practical aspect, it actually makes sense to use a function name such as "hilb" or "hilbert_numerator" and so on for a function that tells something about R/I but takes I as an input.
2) I don't think one can relate the Hilbert series of a (monomial) ideal and the Hilbert series of its integral closure in a useful way.
That's unfortunate. I really need the !Hilbert/Poincaré series of R/I.
Replying to @simon-king-jena:
first of all: Thank you for your thoughtful comments!
No problem. Sorry I am not always able to find the time to review your code.
Replying to @tscrim:
I think this might also give you better C code (and hence, be faster):
if l == 1: return <ETuple> L[0] if l == 2: return (<ETuple> L[0]).eadd(<ETuple> L[1])
Right, actually the first "if" is nonsense in my code: I assign to m1, but do not return it. Anyway, here is the resulting C code: [snip] Is the second one better?
Yes, you can see that it does not do the type check (which could result in a segfault if the result is not the correct type).
Does
L
need to be a list (i.e., not an array ofETuple
's)? This would mean less type-casting.Can you give me a pointer on how to use arrays of Cython types?
Right, you can't have an array of cdef classes. Forgot about that. There is a way around that by having an array of pointers to the objects:
https://stackoverflow.com/questions/33851333/cython-how-do-you-create-an-array-of-cdef-class
It is just really annoying to have to do all of the type-casting.
It also seems like you are using list manipulations other than one sort (which you can most likely just use C
qsort
on).I also use append and pop. How to do these with arrays?
I missed that when going through the code. So then it is just better to use lists. Add lots of <ETuple>
's to avoid as much extra type checking as possible.
If you replace
Id[-1]
withId[len(Id)-1]
, you can then remove the bounds check and wrap around fromHilbertBaseCase
. You probably should remove those checks across the file to get some extra speed.I don't understand that. What bounds check are you talking about? Are you saying that
Id[-1]
does a bounds check, whereasId[len(Id)-1]
doesn't?
Id[-1]
is a wrap-around: normal C would just do pointer offset. You can tell Cython to assume that indices passed in behave like C arrays and do not check if the result is negative (wrap-around check) or outside the list (bounds check).
You are probably going to be better in terms of speed to directly the low-level polynomial manipulation via flint, that way you have fewer intermediate (Sage) objects and indirection.
So, I should allocate the underlying flint C types and manipulate these, and only in the very end create a Sage polynomial? I guess that's related with another point below.
Yes, if you want to squeeze out as much speed as possible. It is more work and makes the code a bit more obscure, but it should give you some more speed.
The
for i from 0 <= i < k by m
is discouraged (if not semi-deprecated) in favor of the more standard Pythonfor i in range(0, k, m)
, which gets turned into the same C code AFAIK.Almost. I get [snip] So, in the non-deprecated version there is an additional variable assignment. Actually this is why I changed my original code (which did use
in range(...)
) intofrom ... by
.
Hmm...curious. Although the compiler will almost certainly optimize those extra variables out anyways.
You should have two functions
quotient_degree
andquotient_degree_weighted
, where the former does not takew
as an input. There is almost no code shared between the two and it makes it easier to understand what is going on where they are called. Same fordegree
.Really? I thought it was clearer the other way around: In all cases, we have a computation that involves degree weights, and it is the job of the function to find out if the weights are trivial or not.
It seems like they want to separate functions to me. Plus you could then require w
to be a list
for the signature.
One thing, though: At some point, one needs to check whether
w
is None or not. By having a massive code duplication, I could achieve that this check is only done once, namely infirst_hilbert_series
. Currently, it is checked frequently, and the frequency wouldn't be reduced with your suggestion of a small code duplication.
I see; I didn't think it was going to be that bad overall. However, it's not a big issue for me the way things are now.
I don't see why
AN
should be adict
(and hence the first argumentsD
to a number of your functions). It seems more like you are mimicking astruct
with adict
. Using astruct
will be much faster because it doesn't have to do string hashing and equality checking; it would be a direct C lookup. For those things you are manipulating in subfunction calls, you can pass it by pointer like you would normally in C (I forget if Cython passes everything by copy or reference by default for C objects).This is related with the point above: In a
struct
, I could also more easily use the underlying C data type of flint polynomials.
Yes, that is correct, but the struct
could hold onto a Sage polynomial just as well. The reason for doing this is both cleaner and faster code by avoiding the dict
.
But just to be clear: I would need to do the memory management myself, wouldn't I? Basically, it is a binary search tree; is there code in Sage (I am sure there is!) where such data structure is used and from where I can get inspiration?
Yes, you would have to do some memory management. I'm not sure exactly how much from my relatively quick look, but it should not be too bad.
Replying to @simon-king-jena:
Just to be sure: If I turn the !ETuple functions into methods, they should probably be cpdef, not cdef, right? As long as all instances of !ETuple in my code are cdefined, a cpdef method wouldn't be less efficient than a cdef method, right?
No, a cpdef called from a Cython file is just a C function call IIRC. So it should not be any less efficient from within Cython code (although I think it can introduce a bit of extra overhead when doing a Python function call compared to a plain def).
sage: %%cython
....: def foo():
....: pass
....: cpdef cpfoo():
....: pass
....: cdef cfoo():
....: pass
....: def test():
....: foo()
....: def ctest():
....: cfoo()
....: def cptest():
....: cpfoo()
....: cpdef cptestcp():
....: cpfoo()
....:
sage: %timeit test()
10000000 loops, best of 3: 34.9 ns per loop
sage: %timeit ctest()
10000000 loops, best of 3: 22.9 ns per loop
sage: %timeit cptest()
10000000 loops, best of 3: 22.9 ns per loop
sage: %timeit cptestcp()
10000000 loops, best of 3: 23.3 ns per loop
(These timings are relatively consistent across multiple runs for me.)
Just some questions to make sure I understand your suggestions:
Replying to @tscrim:
I missed that when going through the code. So then it is just better to use lists. Add lots of
<ETuple>
's to avoid as much extra type checking as possible.
So, <ETuple>
silences the type check, which is faster but can theoretically result in a segfault, right?
Id[-1]
is a wrap-around: normal C would just do pointer offset. You can tell Cython to assume that indices passed in behave like C arrays and do not check if the result is negative (wrap-around check) or outside the list (bounds check).
So, when I do L[len(L)-1]
, Cython will automatically know that a bound check is not needed?
How to avoid a bound check in sum_from_list(list L, size_t s, size_t l)
, where I access L[0], L[1], L[s]
, L[s+l//2]
etc.?
Hmm...curious. Although the compiler will almost certainly optimize those extra variables out anyways.
In that case I should change it.
It seems like they want to separate functions to me. Plus you could then require
w
to be alist
for the signature.
I do require tuple
. Would list
be better?
Anyway, do you think that I should duplicate the loop in first_hilbert_series
, once for the case that w
is None and once for the case that it isn't?
Yes, you would have to do some memory management. I'm not sure exactly how much from my relatively quick look, but it should not be too bad.
Maybe go an easy middle way, and have some
cdef class Node:
cdef Node Back, Left, Right
cdef object LMult # or some flint boilerplate instead of object
cdef object LeftFHS # or some flint boilerplate instead of object
The overhead would be that Python objects will be allocated (unless I use a pool, similar to what Sage does with Integers...), but I guess the allocation is not more than the allocation of a dict, and moreover I wouldn't need to care about memory management. And accessing the cdef attributes of the node will be faster than looking up dictionary items, wouldn't it?
PS, concerning "pool":
I recall that when I was young, there used to be a utility that automatically equipped a given cdef class with a pool/kill list, similar to what is hard coded for Integer. But I cannot find it now. Has it gone?
It may actually be an idea to equip !ETuple with pool/kill list. In that way, MPolynomial_polydict
would probably suck slightly less, and the code here might also be a little faster.
Replying to @simon-king-jena:
PS, concerning "pool":
I recall that when I was young, there used to be a utility that automatically equipped a given cdef class with a pool/kill list, similar to what is hard coded for Integer. But I cannot find it now. Has it gone?
Found it: sage.misc.allocator
Replying to @simon-king-jena:
How to avoid a bound check in
sum_from_list(list L, size_t s, size_t l)
, where I access L[0], L[1],L[s]
,L[s+l//2]
etc.?
Since I am sure about bounds being correct, I guess I can use <ETuple>PyList_GET_ITEM(...)
.
Replying to @simon-king-jena:
Just some questions to make sure I understand your suggestions:
Replying to @tscrim:
I missed that when going through the code. So then it is just better to use lists. Add lots of
<ETuple>
's to avoid as much extra type checking as possible.So,
<ETuple>
silences the type check, which is faster but can theoretically result in a segfault, right?
Yes, that is correct. If you want to do an explicit check in there that can raise an error, you can do <Etuple?>
. (I believe this is suppose to be different than letting Cython raise an error when not given the correct type, but I'm not sure about that, Jeroen?)
Id[-1]
is a wrap-around: normal C would just do pointer offset. You can tell Cython to assume that indices passed in behave like C arrays and do not check if the result is negative (wrap-around check) or outside the list (bounds check).So, when I do
L[len(L)-1]
, Cython will automatically know that a bound check is not needed?
You have to also tell Cython not to do those checks. You can do this at the level of the file by adding # cython: boundscheck=False, wraparound=False
as the first line(s) of the file (see, e.g., matrix/args.pyx
) or a decorator @cython.boundscheck(False)
and @cython.wraparound(False)
on the appropriate functions (see, e.g., combinat/combinat_cython.pyx
).
How to avoid a bound check in
sum_from_list(list L, size_t s, size_t l)
, where I access L[0], L[1],L[s]
,L[s+l//2]
etc.?
See above.
Hmm...curious. Although the compiler will almost certainly optimize those extra variables out anyways.
In that case I should change it.
I bet that will make Jeoren happy. :)
It seems like they want to separate functions to me. Plus you could then require
w
to be alist
for the signature.I do require
tuple
. Wouldlist
be better?
I couldn't remember and was guessing. tuple
is perfectly fine.
Anyway, do you think that I should duplicate the loop in
first_hilbert_series
, once for the case thatw
is None and once for the case that it isn't?
I don't see why that one has to be split. It is only degree
and quotient_degree
that should be split IMO. For example, in HilbertBaseCase
you would have
- for m2 in Id:
- if m is not m2:
- Factor *= (1-t**quotient_degree(m2,m,w))
+ if w is None:
+ for m2 in Id:
+ if m is not m2:
+ Factor *= (1-t**quotient_degree(m2,m))
+ else:
+ for m2 in Id:
+ if m is not m2:
+ Factor *= (1-t**quotient_degree_graded(m2,m,w))
In some cases, this will result in fewer if
checks (unless the compiler is optimizing those out). In others, this will be no worse than before.
Yes, you would have to do some memory management. I'm not sure exactly how much from my relatively quick look, but it should not be too bad.
Maybe go an easy middle way, and have some
cdef class Node: cdef Node Back, Left, Right cdef object LMult # or some flint boilerplate instead of object cdef object LeftFHS # or some flint boilerplate instead of object
The overhead would be that Python objects will be allocated (unless I use a pool, similar to what Sage does with Integers...), but I guess the allocation is not more than the allocation of a dict, and moreover I wouldn't need to care about memory management. And accessing the cdef attributes of the node will be faster than looking up dictionary items, wouldn't it?
Yes, it should be much faster (at that scale) and require less memory than a dict
(hash tables are generally at most 80% full IIRC, but you have to store both the key (which would be a string) and the value). I think a pool is overkill for this as you are building out a tree rather than frequently creating and removing nodes, right? The memory management really only comes into play when you start dealing with the flint objects directly.
Yes, you can use PyList_GET_ITEM
, which I think is even a bit faster.
Replying to @tscrim:
I think a pool is overkill for this as you are building out a tree rather than frequently creating and removing nodes, right? The memory management really only comes into play when you start dealing with the flint objects directly.
I wasn't talking about a pool for Nodes (which just build a tree), but about a pool for !ETuple, which I guess will be created and killed (during interreduction) more frequently.
Replying to @simon-king-jena:
Replying to @tscrim:
I think a pool is overkill for this as you are building out a tree rather than frequently creating and removing nodes, right? The memory management really only comes into play when you start dealing with the flint objects directly.
I wasn't talking about a pool for Nodes (which just build a tree), but about a pool for !ETuple, which I guess will be created and killed (during interreduction) more frequently.
Ah, I see. Yea, that might be good to do. Although the use of the pool could end up having a fair bit of overhead when not having a lot of ETuple
's that would be GC'ed. I haven't actually looked though.
It seems to me that really for i in range(s,e,b)
is a bit slower than for i from s<=i<e by b
. My main example becomes about 2% slower. See also this little benchmark:
sage: cython("""
....: def test1(size_t m):
....: cdef size_t i,j
....: j = 0
....: for i in range(0,m,2):
....: j += i
....: return j
....: def test2(size_t m):
....: cdef size_t i,j
....: j = 0
....: for i from 0<=i<m by 2:
....: j += 1
....: return j
....: """)
sage: %timeit test1(10000)
The slowest run took 4.12 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.18 µs per loop
sage: %timeit test1(10000)
100000 loops, best of 3: 3.1 µs per loop
sage: %timeit test2(10000)
The slowest run took 4.26 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.85 µs per loop
sage: %timeit test2(10000)
1000000 loops, best of 3: 1.86 µs per loop
So, if the loop itself is timed, one sees that the for ... from ... by
is double the speed of for ... in range(...)
.
I guess I should use the deprecated syntax and post the above benchmark on cython-users, shouldn't I?
Ouch. I had a type in my benchmark (j+=i
versus j+=1
). Without the typo, the timing of the test functions is basically equal. So, probably you are right that the additional assignments are optimized out by the compiler.
I don't know if the 2% difference in the real world example is significant.
Branch pushed to git repo; I updated commit sha1. New commits:
7698bb6 | Some code optimizations in Hilbert series computation |
The last commit did some optimizations. What I want to try next:
I wonder about
FLINT_DLL void fmpz_poly_mul(fmpz_poly_t res,
const fmpz_poly_t poly1, const fmpz_poly_t poly2);
Is it legal that res coincides with poly1 or with poly2?
Replying to @simon-king-jena:
I wonder about
FLINT_DLL void fmpz_poly_mul(fmpz_poly_t res, const fmpz_poly_t poly1, const fmpz_poly_t poly2);
Is it legal that res coincides with poly1 or with poly2?
Probably yes. In the sources, I found checks like
if (res == poly1 || res == poly2)...
which indicate that res==poly1 is allowed.
Branch pushed to git repo; I updated commit sha1. New commits:
8ae070b | Use flint polynomial boilerplate for Hilbert series computation |
In the latest commit, I am using boilerplate for flint polynomials. It brings a drastic improvement, apparently a factor of two compared with the previous commit, and there seems to be no memory leak:
sage: from sage.rings.polynomial.hilbert import first_hilbert_series, hilbert_poincare_series
sage: I,grading = load("/home/king/Projekte/coho/hilbert_example.sobj")
sage: get_memory_usage()
7366.8203125
sage: %timeit res = hilbert_poincare_series(I,grading)
10 loops, best of 3: 144 ms per loop
sage: get_memory_usage()
7438.82421875
sage: %timeit res = hilbert_poincare_series(I,grading)
10 loops, best of 3: 145 ms per loop
sage: get_memory_usage()
7438.82421875
sage: %timeit res = hilbert_poincare_series(I,grading)
10 loops, best of 3: 144 ms per loop
sage: get_memory_usage()
7438.82421875
sage: %timeit res = hilbert_poincare_series(I,grading)
10 loops, best of 3: 144 ms per loop
sage: get_memory_usage()
7438.82421875
It might be interesting to compare with Singular in examples that Singular can manage:
sage: R = PolynomialRing(QQ,'x',25)
sage: M = matrix(R,5,R.gens())
sage: I = R*[m.lm() for m in (R*(M^2).list()).groebner_basis()]
sage: I.hilbert_numerator()
-t^30 + 24*t^29 - 275*t^28 + 2000*t^27 - 10350*t^26 + 40480*t^25 - 123971*t^24 + 303600*t^23 - 600774*t^22 + 960600*t^21 - 1222811*t^20 + 1182944*t^19 - 740749*t^18 + 18472*t^17 + 687275*t^16 - 1105688*t^15 + 1161279*t^14 - 963728*t^13 + 671803*t^12 - 393000*t^11 + 179220*t^10 - 51176*t^9 + 503*t^8 + 6024*t^7 - 1400*t^6 - 576*t^5 + 275*t^4 + 24*t^3 - 25*t^2 + 1
sage: first_hilbert_series(I)
-t^30 + 24*t^29 - 275*t^28 + 2000*t^27 - 10350*t^26 + 40480*t^25 - 123971*t^24 + 303600*t^23 - 600774*t^22 + 960600*t^21 - 1222811*t^20 + 1182944*t^19 - 740749*t^18 + 18472*t^17 + 687275*t^16 - 1105688*t^15 + 1161279*t^14 - 963728*t^13 + 671803*t^12 - 393000*t^11 + 179220*t^10 - 51176*t^9 + 503*t^8 + 6024*t^7 - 1400*t^6 - 576*t^5 + 275*t^4 + 24*t^3 - 25*t^2 + 1
so, the result is still correct. However, comparing with Singular-via-pexpect, I get
sage: %timeit first_hilbert_series(I)
10 loops, best of 3: 120 ms per loop
versus
sage: Is = singular(I)
sage: singular.eval('attrib({},"isSB",1)'.format(Is.name()))
''
sage: %timeit singular.eval(hv.name()+'=hilb({},1)'.format(Is.name()))
10 loops, best of 3: 163 ms per loop
Conclusion: The new implementation not only avoids the 32 bit limitation of Singular but seems to be faster than Singular. So, meanwhile the implementation is competitive.
Next, I will see if a pool for !ETuple makes sense, and in any case I will move some cdef functions to cpdef methods of !ETuple.
I just tested: In the main example (the one in the attachment to this ticket), there are 93382 allocations and the same number of deallocations of !ETuples. It sounds much, however it doesn't contribute much time:
sage: cython("""
....: from sage.rings.polynomial.polydict cimport ETuple
....: def test():
....: cdef ETuple m
....: cdef int i
....: for i in range(93000):
....: m = ETuple.__new__(ETuple)
....: """
....: )
sage: %timeit test()
100 loops, best of 3: 3.03 ms per loop
So, the prospect of saving less than 3ms in an example that takes 145 ms isn't enough to justify the implementation of a pool. Do you agree?
You probably are not doing enough creations and deletions with the same amount in memory. Probably the bigger time spent is in allocating the data list in the ETuple
, which cannot reasonably1 be put in a pool since those lengths almost certainly change each (re)allocation. So you are not really gaining much from that. Perhaps it is worthwhile to try on a bigger example, say something that takes 1-2 seconds and see if you still get a 2% savings or if there is just little change.
1 Of course, it is possible, but I am pretty sure you have to write your own memory manager.
One other general comment: IMO it is still useful to add a little bit of documentation via a docstring explaining what functions do even when they are cdef
(you do not need to doctest them however).
Just a general question: do you this is would be easy and worthwhile to pull the main Hilbert series computation code out into a standalone Cython package library (that would depend only on flint)?
One last comment: Please remove this global definition:
# Global definition
PR = PolynomialRing(ZZ,'t')
t = PR('t')
I see no reason to nail it in memory.
Branch pushed to git repo; I updated commit sha1. New commits:
d997abe | Fix a corner case in Hilbert series computation |
Replying to @tscrim:
One other general comment: IMO it is still useful to add a little bit of documentation via a docstring explaining what functions do even when they are
cdef
(you do not need to doctest them however).
Yes, documentation should be added. And another test, for the corner case that is fixed in the previous commit.
Replying to @tscrim:
Just a general question: do you this is would be easy and worthwhile to pull the main Hilbert series computation code out into a standalone Cython package library (that would depend only on flint)?
Not totally sure. What I need is !ETuple, which is defined in Sage. So, unless !ETuple is pulled out as well, it makes no sense. Apart from !ETuple though, I think it would make sense: If I see that correctly, the only place where I use more than flint is in hilbert_poincare_series, in which I currently just do a division of two flint polynomials in Sage.
One last comment: Please remove this global definition:
# Global definition PR = PolynomialRing(ZZ,'t') t = PR('t')
I see no reason to nail it in memory.
Done in the previous commit.
Ouch. The previous commit was supposed to fix the corner case of the zero ideal, but I did a mistake so that now the corner case of the one ideal became wrong. Will be fixed next, together with more documentation.
Branch pushed to git repo; I updated commit sha1. New commits:
5af3439 | Fix the erroneous fix of some corner cases |
Branch pushed to git repo; I updated commit sha1. New commits:
138f85b | Turn some functions of polynomial.hilbert to methods of ETuple |
The latest commit turns some cdef functions into cpdef methods.
The effect is not good, the computation time dropped from about 145ms to about 190ms:
sage: %time hilbert_poincare_series(I,grading)
CPU times: user 192 ms, sys: 0 ns, total: 192 ms
Wall time: 191 ms
(-8*t^26 + 24*t^25 - 16*t^24 - 16*t^23 + 24*t^22 - 8*t^21 - t^7 - t^6 + t^5 - t^4 - t^3 - t^2 + t - 1)/(t^13 - 3*t^12 + 2*t^11 + 2*t^10 - 3*t^9 + t^8 - t^5 + 3*t^4 - 2*t^3 - 2*t^2 + 3*t - 1)
sage: %crun hilbert_poincare_series(I,grading)
PROFILE: interrupts/evictions/bytes = 20/0/8288
Using local file /home/king/Sage/git/sage/local/bin/python2.
Using local file /home/king/.sage/temp/klap/5581/tmp_fZKweM.perf.
Total: 20 samples
0 0.0% 0.0% 19 95.0% PyEval_EvalCode
0 0.0% 0.0% 19 95.0% PyEval_EvalCodeEx
0 0.0% 0.0% 19 95.0% PyEval_EvalFrameEx
0 0.0% 0.0% 19 95.0% PyObject_Call
0 0.0% 0.0% 19 95.0% PyRun_FileExFlags
0 0.0% 0.0% 19 95.0% PyRun_SimpleFileExFlags
0 0.0% 0.0% 19 95.0% PyRun_StringFlags
0 0.0% 0.0% 19 95.0% Py_Main
0 0.0% 0.0% 19 95.0% __Pyx_PyObject_Call
0 0.0% 0.0% 19 95.0% __libc_start_main
0 0.0% 0.0% 19 95.0% __pyx_pf_4sage_5rings_10polynomial_7hilbert_2hilbert_poincare_series (inline)
0 0.0% 0.0% 19 95.0% __pyx_pf_4sage_5rings_10polynomial_7hilbert_first_hilbert_series (inline)
0 0.0% 0.0% 19 95.0% __pyx_pw_4sage_5rings_10polynomial_7hilbert_1first_hilbert_series
0 0.0% 0.0% 19 95.0% __pyx_pw_4sage_5rings_10polynomial_7hilbert_3hilbert_poincare_series
0 0.0% 0.0% 19 95.0% _start
0 0.0% 0.0% 19 95.0% call_function (inline)
0 0.0% 0.0% 19 95.0% exec_statement (inline)
0 0.0% 0.0% 19 95.0% ext_do_call (inline)
0 0.0% 0.0% 19 95.0% fast_function (inline)
0 0.0% 0.0% 19 95.0% function_call
0 0.0% 0.0% 19 95.0% run_mod (inline)
0 0.0% 0.0% 11 55.0% __pyx_f_4sage_5rings_10polynomial_7hilbert_make_children (inline)
0 0.0% 0.0% 9 45.0% __pyx_f_4sage_5rings_10polynomial_7hilbert_interred
0 0.0% 0.0% 8 40.0% __pyx_f_4sage_5rings_10polynomial_7hilbert_indivisible_in_list (inline)
8 40.0% 40.0% 8 40.0% __pyx_f_4sage_5rings_10polynomial_8polydict_6ETuple_divides
0 0.0% 40.0% 5 25.0% __pyx_f_4sage_5rings_10polynomial_7hilbert_quotient_by_var (inline)
0 0.0% 40.0% 4 20.0% __pyx_f_4sage_5rings_10polynomial_7hilbert_HilbertBaseCase.isra.41
0 0.0% 40.0% 3 15.0% PyObject_RichCompare
1 5.0% 45.0% 3 15.0% __pyx_pw_4sage_5rings_10polynomial_8polydict_6ETuple_11__getitem__
0 0.0% 45.0% 3 15.0% __pyx_sq_item_4sage_5rings_10polynomial_8polydict_ETuple
0 0.0% 45.0% 3 15.0% fmpz_poly_mul
0 0.0% 45.0% 2 10.0% __Pyx_PyObject_Call (inline)
0 0.0% 45.0% 2 10.0% __Pyx_PyObject_CallOneArg
0 0.0% 45.0% 2 10.0% __pyx_pf_4sage_5rings_10polynomial_8polydict_6ETuple_10__getitem__ (inline)
1 5.0% 50.0% 2 10.0% _fmpz_poly_mul_tiny1
2 10.0% 60.0% 2 10.0% convert_3way_to_object (inline)
0 0.0% 60.0% 2 10.0% do_richcmp (inline)
0 0.0% 60.0% 2 10.0% try_3way_to_rich_compare (inline)
0 0.0% 60.0% 1 5.0% 0x00007fe83c01f837
1 5.0% 65.0% 1 5.0% PyInt_FromLong
0 0.0% 65.0% 1 5.0% PyNumber_Remainder
0 0.0% 65.0% 1 5.0% PyObject_RichCompareBool
1 5.0% 70.0% 1 5.0% _PyObject_New
0 0.0% 70.0% 1 5.0% __Pyx_PyFunction_FastCallDict.constprop.73
0 0.0% 70.0% 1 5.0% __Pyx_PyFunction_FastCallNoKw (inline)
0 0.0% 70.0% 1 5.0% __Pyx_PyInt_As_size_t.part.41
0 0.0% 70.0% 1 5.0% __Pyx_PyInt_From_int (inline)
0 0.0% 70.0% 1 5.0% __Pyx_PyNumber_IntOrLong (inline)
1 5.0% 75.0% 1 5.0% __Pyx_TypeTest (inline)
0 0.0% 75.0% 1 5.0% __Pyx__PyObject_CallOneArg
0 0.0% 75.0% 1 5.0% __libc_calloc
0 0.0% 75.0% 1 5.0% __pyx_f_4sage_5rings_10polynomial_8polydict_6ETuple_weighted_degree
0 0.0% 75.0% 1 5.0% __pyx_pf_4sage_5rings_10polynomial_8polydict_6ETuple_18__richcmp__ (inline)
1 5.0% 80.0% 1 5.0% __pyx_pf_4sage_5rings_7integer_7Integer_122__int__ (inline)
0 0.0% 80.0% 1 5.0% __pyx_pw_4sage_5rings_10polynomial_8polydict_6ETuple_19__richcmp__
0 0.0% 80.0% 1 5.0% __pyx_pw_4sage_5rings_7integer_7Integer_123__int__
1 5.0% 85.0% 1 5.0% _fmpz_vec_zero
1 5.0% 90.0% 1 5.0% _init
1 5.0% 95.0% 1 5.0% _int_malloc
0 0.0% 95.0% 1 5.0% binary_op (inline)
1 5.0% 100.0% 1 5.0% binary_op1 (inline)
0 0.0% 100.0% 1 5.0% build_sortwrapper (inline)
0 0.0% 100.0% 1 5.0% flint_calloc
0 0.0% 100.0% 1 5.0% fmpz_poly_init2
0 0.0% 100.0% 1 5.0% listindex
0 0.0% 100.0% 1 5.0% listsort
What is convert_3way_to_object
?
Do you see why the computation became A LOT slower? Some of the functions that I moved from hilbert to ETuple-methods use to be "cdef inline", but are now "cpdef". Is it possible that the removal of "inline" was bad?
Also note that there is a compiler warning. If I understand correctly, it comes from the line
L.sort(key=ETuple.unweighted_degree)
How else should I force sorting by unweighted degree?
PS: Of course I should add documentation. But first I want to know why the code became so much slower.
I was wondering if perhaps the slowness came from NOT defining the parent of the first Hilbert series globally:
sage: cython("""
....: from sage.all import ZZ
....: P = ZZ['t']
....: t = P.gen()
....: def test1(i):
....: return t**i
....: """)
sage: cython("""
....: def test2(i):
....: from sage.all import ZZ
....: P = ZZ['t']
....: t = P.gen()
....: return t**i
....: """)
sage: %timeit test1(50)
The slowest run took 29.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.83 µs per loop
sage: %timeit test2(50)
The slowest run took 4.76 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 42.2 µs per loop
So, that's a lot. But the slow-down can be reduced:
sage: cython("""
....: def test3(i):
....: from sage.all import ZZ,PolynomialRing
....: P = PolynomialRing(ZZ,'t')
....: t = P.gen()
....: return t**i
....: """)
sage: %timeit test3(50)
The slowest run took 4.88 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 24.6 µs per loop
Perhaps I should change my code accordingly.
Without a change:
sage: from sage.rings.polynomial.hilbert import first_hilbert_series, hilbert_poincare_series
sage: I,grading = load("/home/king/Projekte/coho/hilbert_example.sobj")
sage: %timeit first_hilbert_series(I,grading)
1 loop, best of 3: 177 ms per loop
sage: %timeit hilbert_poincare_series(I,grading)
1 loop, best of 3: 176 ms per loop
sage: %timeit first_hilbert_series(I,grading)
10 loops, best of 3: 176 ms per loop
sage: %timeit hilbert_poincare_series(I,grading)
1 loop, best of 3: 179 ms per loop
With PolynomialRing(ZZ,'t')
instead of ZZ['t']
:
sage: %timeit first_hilbert_series(I,grading)
10 loops, best of 3: 171 ms per loop
sage: %timeit hilbert_poincare_series(I,grading)
10 loops, best of 3: 173 ms per loop
sage: %timeit first_hilbert_series(I,grading)
10 loops, best of 3: 172 ms per loop
sage: %timeit hilbert_poincare_series(I,grading)
1 loop, best of 3: 172 ms per loop
So, it really doesn't help much. And globally defining the polynomial ring doesn't help either.
Hence, I believe the slowness comes from the cpdef methods or the not-inlining.
I know that if the compiler knows that we are dealing with an ETuple that a cpdef method should have the same speed as a cdef function. But in fact it has not. With cdef methods:
sage: I,grading = load("/home/king/Projekte/coho/hilbert_example.sobj")
sage: %time hilbert_poincare_series(I,grading)
CPU times: user 168 ms, sys: 43 µs, total: 168 ms
Wall time: 168 ms
(8*t^26 - 24*t^25 + 16*t^24 + 16*t^23 - 24*t^22 + 8*t^21 + t^7 + t^6 - t^5 + t^4 + t^3 + t^2 - t + 1)/(-t^13 + 3*t^12 - 2*t^11 - 2*t^10 + 3*t^9 - t^8 + t^5 - 3*t^4 + 2*t^3 + 2*t^2 - 3*t + 1)
sage: %timeit first_hilbert_series(I)
10 loops, best of 3: 111 ms per loop
sage: %timeit first_hilbert_series(I,grading)
10 loops, best of 3: 158 ms per loop
With cpdef methods
sage: from sage.rings.polynomial.hilbert import first_hilbert_series, hilbert_poincare_series
sage: I,grading = load("/home/king/Projekte/coho/hilbert_example.sobj")
sage: %time hilbert_poincare_series(I,grading)
CPU times: user 178 ms, sys: 7 µs, total: 178 ms
Wall time: 177 ms
(8*t^26 - 24*t^25 + 16*t^24 + 16*t^23 - 24*t^22 + 8*t^21 + t^7 + t^6 - t^5 + t^4 + t^3 + t^2 - t + 1)/(-t^13 + 3*t^12 - 2*t^11 - 2*t^10 + 3*t^9 - t^8 + t^5 - 3*t^4 + 2*t^3 + 2*t^2 - 3*t + 1)
sage: %timeit first_hilbert_series(I)
10 loops, best of 3: 128 ms per loop
sage: %timeit first_hilbert_series(I,grading)
1 loop, best of 3: 171 ms per loop
So, we have a slow-down of about 11-15%. Therefore I'd prefer to use cdef methods, where possible (i.e., the only exception is unweighted_degree
, which I use for sorting).
Next plan: Try to test w is None
less often, by a moderate code duplication.
In my work on group cohomology, I was bitten by the fact that Singular limits integers to 32 bit in the coefficients of Hilbert series. By consequence, examples such as the following from #20145 fail:
It seems that upstream does not want to change Singular to use 64 bit integers by default and arbitrary size integers when needed (that's what I would propose. Therefore I am providing an implementation of Hilbert series computation that does not suffer from such limitation.
The above is the correct result according to #20145.
My implementation is a bit slower than libsingular, which is why I do not propose to use my implementation as a replacement of libsingular's
hilb
function. However, in big applications, it is a good way out.CC: @w-bruns
Component: commutative algebra
Keywords: Hilbert series
Author: Simon King
Branch/Commit:
5637e07
Reviewer: Travis Scrimshaw
Issue created by migration from https://trac.sagemath.org/ticket/26243