sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 480 forks source link

Provide Hilbert series implementation beyond Singular's limitations #26243

Closed simon-king-jena closed 6 years ago

simon-king-jena commented 6 years ago

In my work on group cohomology, I was bitten by the fact that Singular limits integers to 32 bit in the coefficients of Hilbert series. By consequence, examples such as the following from #20145 fail:

        sage: n=4;m=11;P = PolynomialRing(QQ,n*m,"x"); x = P.gens(); M = Matrix(n,x)
        sage: I = P.ideal(M.minors(2))
        sage: J = P*[m.lm() for m in I.groebner_basis()]
        sage: J.hilbert_numerator()
        Traceback (most recent call last):
        ...
        RuntimeError: error in Singular function call 'hilb':
        int overflow in hilb 1

It seems that upstream does not want to change Singular to use 64 bit integers by default and arbitrary size integers when needed (that's what I would propose. Therefore I am providing an implementation of Hilbert series computation that does not suffer from such limitation.

        sage: from sage.rings.polynomial.hilbert import first_hilbert_series
        sage: hilbert_poincare_series(J).numerator()
        120*t^3 + 135*t^2 + 30*t + 1
        sage: hilbert_poincare_series(J).denominator().factor()
        (t - 1)^14

The above is the correct result according to #20145.

My implementation is a bit slower than libsingular, which is why I do not propose to use my implementation as a replacement of libsingular's hilb function. However, in big applications, it is a good way out.

CC: @w-bruns

Component: commutative algebra

Keywords: Hilbert series

Author: Simon King

Branch/Commit: 5637e07

Reviewer: Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/26243

simon-king-jena commented 6 years ago

Branch: u/SimonKing/Hilbert_functions_unlimited

simon-king-jena commented 6 years ago

Commit: 827f579

simon-king-jena commented 6 years ago

New commits:

1a6c6dcProvide an implementation of Hilbert series without Singular's limitations
827f579Provide documentation for Hilbert series computation
simon-king-jena commented 6 years ago

Description changed:

--- 
+++ 
@@ -2,6 +2,7 @@
     sage: n=4;m=11;P = PolynomialRing(QQ,n*m,"x"); x = P.gens(); M = Matrix(n,x)
d6056c42-5b65-4eea-a816-4019a4d0b1e1 commented 6 years ago
comment:5

The residue class ring of the polynomial ring in 11 x 4 variables modulo the 2x2 minors is a normal monoid ring. You can use Normaliz, but there is also an explicit formula. See MR1213858

simon-king-jena commented 6 years ago

Some real world example for the computation of Hilbert series

simon-king-jena commented 6 years ago
comment:6

Attachment: hilbert_example.sobj.gz

Replying to @w-bruns:

The residue class ring of the polynomial ring in 11 x 4 variables modulo the 2x2 minors is a normal monoid ring. You can use Normaliz, but there is also an explicit formula. See MR1213858

On #25091, you said that normaliz can only compute the Hilbert series of the integral closure of a monomial ideal. That may not be what is required here.

Anyway: Clearly I am not interested in that particular example. The example is just that: An example, that I borrowed from #20145. It is suitable as a doc test, as it can be set up in short time; and it is instructive as a doc example, as it exposes limitations of Singular that make Singular unusable in "real world examples" for which there are no explicit formulae in the literature.

Or would you know a formula for the Hilbert series of the mod-2 cohomology ring of group number 299 of order 256 (numbering according to the small groups library)? That cohomology ring is the smallest potential counter example to a conjecture of Dave Benson. A small part of the attempt to verify that conjecture is the computation of the Hilbert series of several quotients of that ring. Unfortunately, that "small part" is too big for Singular or Frobby.

The leading ideal of the relation ideal of the above-mentioned cohomology ring is in attachment: hilbert_example.sobjThe computation of its Hilbert series actually shouldn't take very long:

sage: from sage.rings.polynomial.hilbert import hilbert_poincare_series
sage: I,grading = load("/home/king/Projekte/coho/hilbert_example.sobj")
sage: %time hilbert_poincare_series(I,grading)
CPU times: user 270 ms, sys: 2.85 ms, total: 273 ms
Wall time: 324 ms
(-8*t^26 + 24*t^25 - 16*t^24 - 16*t^23 + 24*t^22 - 8*t^21 - t^7 - t^6 + t^5 - t^4 - t^3 - t^2 + t - 1)/(t^13 - 3*t^12 + 2*t^11 + 2*t^10 - 3*t^9 + t^8 - t^5 + 3*t^4 - 2*t^3 - 2*t^2 + 3*t - 1)

However, Singular immediately tells that the example is too big, and Frobby needs to be killed after several minutes. There is no interface to CoCoA, and I don't know how to utilise the interface to normaliz. I also don't know how to install Macaulay2, so, I can't test that either.

tscrim commented 6 years ago
comment:7

All of these comments are about sum_from_list:

Something that should give you better speed would be having it take another argument of the starting point to do the sum. That way you would not have to create all of the intermediate lists (e.g., L[:l2] and L[:l2]). It will be slightly trickier to code, but not by much and I would imagine give a decent speedup.

I think this might also give you better C code (and hence, be faster):

    if l == 1:
        return <ETuple> L[0]
    if l == 2:
        return (<ETuple> L[0]).eadd(<ETuple> L[1])

Some other thoughts:

Does L need to be a list (i.e., not an array of ETuple's)? This would mean less type-casting. It also seems like you are using list manipulations other than one sort (which you can most likely just use C qsort on).

It is faster to do if not Id: than if len(Id)==0:, even in Cython.

If you replace Id[-1] with Id[len(Id)-1], you can then remove the bounds check and wrap around from HilbertBaseCase. You probably should remove those checks across the file to get some extra speed.

You are probably going to be better in terms of speed to directly the low-level polynomial manipulation via flint, that way you have fewer intermediate (Sage) objects and indirection.

Should those ETuple functions be a part of the ETuple class?

The for i from 0 <= i < k by m is discouraged (if not semi-deprecated) in favor of the more standard Python for i in range(0, k, m), which gets turned into the same C code AFAIK.

You can avoid a few additions here:

-    for i from 0 <= i < 2*m._nonzero by 2:
-        degree += m._data[i+1]
+    for i in range(1, 2*m._nonzero, 2):
+        degree += m._data[i]

You should have two functions quotient_degree and quotient_degree_weighted, where the former does not take w as an input. There is almost no code shared between the two and it makes it easier to understand what is going on where they are called. Same for degree.

quotient(by_var) knows its return type is a list since interred returns a list.

I don't see why AN should be a dict (and hence the first arguments D to a number of your functions). It seems more like you are mimicking a struct with a dict. Using a struct will be much faster because it doesn't have to do string hashing and equality checking; it would be a direct C lookup. For those things you are manipulating in subfunction calls, you can pass it by pointer like you would normally in C (I forget if Cython passes everything by copy or reference by default for C objects).

simon-king-jena commented 6 years ago
comment:8

Dear Travis,

first of all: Thank you for your thoughtful comments!

Replying to @tscrim:

Something that should give you better speed would be having it take another argument of the starting point to do the sum. That way you would not have to create all of the intermediate lists (e.g., L[:l2] and L[:l2]). It will be slightly trickier to code, but not by much and I would imagine give a decent speedup.

Right, I should try that.

I think this might also give you better C code (and hence, be faster):

    if l == 1:
        return <ETuple> L[0]
    if l == 2:
        return (<ETuple> L[0]).eadd(<ETuple> L[1])

Right, actually the first "if" is nonsense in my code: I assign to m1, but do not return it. Anyway, here is the resulting C code:

    /* "_home_king_Sage_git_sage_src_sage_rings_polynomial_hilbert_pyx_0.pyx":268
 *     if l==1:
 *         m1 = L[0]
 *         return L[0]             # <<<<<<<<<<<<<<
 *     if l==2:
 *         m1,m2=L
 */
    __Pyx_XDECREF(((PyObject *)__pyx_r));
    if (unlikely(__pyx_v_L == Py_None)) {
      PyErr_SetString(PyExc_TypeError, "'NoneType' object is not subscriptable");
      __PYX_ERR(0, 268, __pyx_L1_error)
    }
    __pyx_t_2 = __Pyx_GetItemInt_List(__pyx_v_L, 0, long, 1, __Pyx_PyInt_From_long, 1, 0, 1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 268, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    if (!(likely(((__pyx_t_2) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_2, __pyx_ptype_4sage_5rings_10polynomial_8polydict_ETuple))))) __PYX_ERR(0, 268, __pyx_L1_error)
    __pyx_r = ((struct __pyx_obj_4sage_5rings_10polynomial_8polydict_ETuple *)__pyx_t_2);
    __pyx_t_2 = 0;
    goto __pyx_L0;

versus

    /* "_home_king_Sage_git_sage_src_sage_rings_polynomial_hilbert_pyx_0.pyx":268
 *     if l==1:
 *         m1 = L[0]
 *         return <ETuple>L[0]             # <<<<<<<<<<<<<<
 *     if l==2:
 *         m1,m2=L
 */
    __Pyx_XDECREF(((PyObject *)__pyx_r));
    if (unlikely(__pyx_v_L == Py_None)) {
      PyErr_SetString(PyExc_TypeError, "'NoneType' object is not subscriptable");
      __PYX_ERR(0, 268, __pyx_L1_error)
    }
    __pyx_t_2 = __Pyx_GetItemInt_List(__pyx_v_L, 0, long, 1, __Pyx_PyInt_From_long, 1, 0, 1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 268, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __Pyx_INCREF(((PyObject *)((struct __pyx_obj_4sage_5rings_10polynomial_8polydict_ETuple *)__pyx_t_2)));
    __pyx_r = ((struct __pyx_obj_4sage_5rings_10polynomial_8polydict_ETuple *)__pyx_t_2);
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    goto __pyx_L0;

Is the second one better?

Does L need to be a list (i.e., not an array of ETuple's)? This would mean less type-casting.

Can you give me a pointer on how to use arrays of Cython types?

It also seems like you are using list manipulations other than one sort (which you can most likely just use C qsort on).

I also use append and pop. How to do these with arrays?

It is faster to do if not Id: than if len(Id)==0:, even in Cython.

Cool! I thought that, again, it is equivalent. But the code becomes a lot shorter:

  /* "_home_king_Sage_git_sage_src_sage_rings_polynomial_hilbert_pyx_0.pyx":292
 *     cdef int e
 *     # First, the easiest cases:
 *     if len(Id)==0:             # <<<<<<<<<<<<<<
 *         return PR(1)
 *     cdef ETuple m = Id[-1]
 */
  if (unlikely(__pyx_v_Id == Py_None)) {
    PyErr_SetString(PyExc_TypeError, "object of type 'NoneType' has no len()");
    __PYX_ERR(0, 292, __pyx_L1_error)
  }
  __pyx_t_2 = PyList_GET_SIZE(__pyx_v_Id); if (unlikely(__pyx_t_2 == ((Py_ssize_t)-1))) __PYX_ERR(0, 292, __pyx_L1_error)
  __pyx_t_3 = ((__pyx_t_2 == 0) != 0);
  if (__pyx_t_3) {

versus

  /* "_home_king_Sage_git_sage_src_sage_rings_polynomial_hilbert_pyx_0.pyx":292
 *     cdef int e
 *     # First, the easiest cases:
 *     if not Id:             # <<<<<<<<<<<<<<
 *         return PR(1)
 *     cdef ETuple m = Id[-1]
 */
  __pyx_t_2 = (__pyx_v_Id != Py_None)&&(PyList_GET_SIZE(__pyx_v_Id) != 0);
  __pyx_t_3 = ((!__pyx_t_2) != 0);
  if (__pyx_t_3) {

If you replace Id[-1] with Id[len(Id)-1], you can then remove the bounds check and wrap around from HilbertBaseCase. You probably should remove those checks across the file to get some extra speed.

I don't understand that. What bounds check are you talking about? Are you saying that Id[-1] does a bounds check, whereas Id[len(Id)-1] doesn't?

You are probably going to be better in terms of speed to directly the low-level polynomial manipulation via flint, that way you have fewer intermediate (Sage) objects and indirection.

So, I should allocate the underlying flint C types and manipulate these, and only in the very end create a Sage polynomial? I guess that's related with another point below.

Should those ETuple functions be a part of the ETuple class?

Probably. Since ETuple is designed for monomials anyway, it make sense to have monomial divisibility checks there and not here.

The for i from 0 <= i < k by m is discouraged (if not semi-deprecated) in favor of the more standard Python for i in range(0, k, m), which gets turned into the same C code AFAIK.

Almost. I get

  /* "_home_king_Sage_git_sage_src_sage_rings_polynomial_hilbert_pyx_0.pyx":110
 *     cdef int exp1
 *     cdef ETuple result
 *     for i from 0 <= i < 2*m1._nonzero by 2:             # <<<<<<<<<<<<<<
 *         if m1._data[i] == index:
 *             result = <ETuple>m1._new()
 */
  __pyx_t_1 = (2 * __pyx_v_m1->_nonzero);
  for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_1; __pyx_v_i+=2) {
    ...
    }

versus

  /* "_home_king_Sage_git_sage_src_sage_rings_polynomial_hilbert_pyx_0.pyx":110
 *     cdef int exp1
 *     cdef ETuple result
 *     for i in range(0,2*m1._nonzero,2):             # <<<<<<<<<<<<<<
 *         if m1._data[i] == index:
 *             result = <ETuple>m1._new()
 */
  __pyx_t_1 = (2 * __pyx_v_m1->_nonzero);
  __pyx_t_2 = __pyx_t_1;
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=2) {
    __pyx_v_i = __pyx_t_3;
    ...
    }

So, in the non-deprecated version there is an additional variable assignment. Actually this is why I changed my original code (which did use in range(...)) into from ... by.

You can avoid a few additions here:

-    for i from 0 <= i < 2*m._nonzero by 2:
-        degree += m._data[i+1]
+    for i in range(1, 2*m._nonzero, 2):
+        degree += m._data[i]

Right, good point!

You should have two functions quotient_degree and quotient_degree_weighted, where the former does not take w as an input. There is almost no code shared between the two and it makes it easier to understand what is going on where they are called. Same for degree.

Really? I thought it was clearer the other way around: In all cases, we have a computation that involves degree weights, and it is the job of the function to find out if the weights are trivial or not.

One thing, though: At some point, one needs to check whether w is None or not. By having a massive code duplication, I could achieve that this check is only done once, namely in first_hilbert_series. Currently, it is checked frequently, and the frequency wouldn't be reduced with your suggestion of a small code duplication.

quotient(by_var) knows its return type is a list since interred returns a list.

Right.

I don't see why AN should be a dict (and hence the first arguments D to a number of your functions). It seems more like you are mimicking a struct with a dict. Using a struct will be much faster because it doesn't have to do string hashing and equality checking; it would be a direct C lookup. For those things you are manipulating in subfunction calls, you can pass it by pointer like you would normally in C (I forget if Cython passes everything by copy or reference by default for C objects).

This is related with the point above: In a struct, I could also more easily use the underlying C data type of flint polynomials.

But just to be clear: I would need to do the memory management myself, wouldn't I? Basically, it is a binary search tree; is there code in Sage (I am sure there is!) where such data structure is used and from where I can get inspiration?

simon-king-jena commented 6 years ago
comment:9

Just to be sure: If I turn the !ETuple functions into methods, they should probably be cpdef, not cdef, right? As long as all instances of !ETuple in my code are cdefined, a cpdef method wouldn't be less efficient than a cdef method, right?

d6056c42-5b65-4eea-a816-4019a4d0b1e1 commented 6 years ago
comment:10

1) One of the bad aspects of Singular is that it is the opposite of being careful in dealing with classes of objects. What is called the Hilbert series of an ideal I there, is the Hilbert series of R/I where R is the ambient polynomial ring. But I is a graded module and has its own Hilbert series. If one wants the Hilbert series of R/I, it should NOT be called the Hilbert series of I.

2) I don't think one can relate the Hilbert series of a (monomial) ideal and the Hilbert series of its integral closure in a useful way.

simon-king-jena commented 6 years ago
comment:11

Replying to @w-bruns:

1) One of the bad aspects of Singular is that it is the opposite of being careful in dealing with classes of objects. What is called the Hilbert series of an ideal I there, is the Hilbert series of R/I where R is the ambient polynomial ring. But I is a graded module and has its own Hilbert series. If one wants the Hilbert series of R/I, it should NOT be called the Hilbert series of I.

Right. In most cases, I actually use the wording "Hilbert series of I" and "Poincaré series of R/I", which I guess isn't standard, but at least it makes the difference clear. But I think in my comments above I haven't been sufficiently careful with these wordings.

EDIT: Note that Bigatti uses the notation "", which coincides with the Hilbert-Poincaré series of R/I. But that's just a notation, not a name. It seems to me that in practical applications, one most commonly has an ideal I and wants to compute the Hilbert-Poincaré series of R/I, not of "I as a module". Hence, from a practical aspect, it actually makes sense to use a function name such as "hilb" or "hilbert_numerator" and so on for a function that tells something about R/I but takes I as an input.

2) I don't think one can relate the Hilbert series of a (monomial) ideal and the Hilbert series of its integral closure in a useful way.

That's unfortunate. I really need the !Hilbert/Poincaré series of R/I.

tscrim commented 6 years ago
comment:12

Replying to @simon-king-jena:

first of all: Thank you for your thoughtful comments!

No problem. Sorry I am not always able to find the time to review your code.

Replying to @tscrim:

I think this might also give you better C code (and hence, be faster):

    if l == 1:
        return <ETuple> L[0]
    if l == 2:
        return (<ETuple> L[0]).eadd(<ETuple> L[1])

Right, actually the first "if" is nonsense in my code: I assign to m1, but do not return it. Anyway, here is the resulting C code: [snip] Is the second one better?

Yes, you can see that it does not do the type check (which could result in a segfault if the result is not the correct type).

Does L need to be a list (i.e., not an array of ETuple's)? This would mean less type-casting.

Can you give me a pointer on how to use arrays of Cython types?

Right, you can't have an array of cdef classes. Forgot about that. There is a way around that by having an array of pointers to the objects:

https://stackoverflow.com/questions/33851333/cython-how-do-you-create-an-array-of-cdef-class

It is just really annoying to have to do all of the type-casting.

It also seems like you are using list manipulations other than one sort (which you can most likely just use C qsort on).

I also use append and pop. How to do these with arrays?

I missed that when going through the code. So then it is just better to use lists. Add lots of <ETuple>'s to avoid as much extra type checking as possible.

If you replace Id[-1] with Id[len(Id)-1], you can then remove the bounds check and wrap around from HilbertBaseCase. You probably should remove those checks across the file to get some extra speed.

I don't understand that. What bounds check are you talking about? Are you saying that Id[-1] does a bounds check, whereas Id[len(Id)-1] doesn't?

Id[-1] is a wrap-around: normal C would just do pointer offset. You can tell Cython to assume that indices passed in behave like C arrays and do not check if the result is negative (wrap-around check) or outside the list (bounds check).

You are probably going to be better in terms of speed to directly the low-level polynomial manipulation via flint, that way you have fewer intermediate (Sage) objects and indirection.

So, I should allocate the underlying flint C types and manipulate these, and only in the very end create a Sage polynomial? I guess that's related with another point below.

Yes, if you want to squeeze out as much speed as possible. It is more work and makes the code a bit more obscure, but it should give you some more speed.

The for i from 0 <= i < k by m is discouraged (if not semi-deprecated) in favor of the more standard Python for i in range(0, k, m), which gets turned into the same C code AFAIK.

Almost. I get [snip] So, in the non-deprecated version there is an additional variable assignment. Actually this is why I changed my original code (which did use in range(...)) into from ... by.

Hmm...curious. Although the compiler will almost certainly optimize those extra variables out anyways.

You should have two functions quotient_degree and quotient_degree_weighted, where the former does not take w as an input. There is almost no code shared between the two and it makes it easier to understand what is going on where they are called. Same for degree.

Really? I thought it was clearer the other way around: In all cases, we have a computation that involves degree weights, and it is the job of the function to find out if the weights are trivial or not.

It seems like they want to separate functions to me. Plus you could then require w to be a list for the signature.

One thing, though: At some point, one needs to check whether w is None or not. By having a massive code duplication, I could achieve that this check is only done once, namely in first_hilbert_series. Currently, it is checked frequently, and the frequency wouldn't be reduced with your suggestion of a small code duplication.

I see; I didn't think it was going to be that bad overall. However, it's not a big issue for me the way things are now.

I don't see why AN should be a dict (and hence the first arguments D to a number of your functions). It seems more like you are mimicking a struct with a dict. Using a struct will be much faster because it doesn't have to do string hashing and equality checking; it would be a direct C lookup. For those things you are manipulating in subfunction calls, you can pass it by pointer like you would normally in C (I forget if Cython passes everything by copy or reference by default for C objects).

This is related with the point above: In a struct, I could also more easily use the underlying C data type of flint polynomials.

Yes, that is correct, but the struct could hold onto a Sage polynomial just as well. The reason for doing this is both cleaner and faster code by avoiding the dict.

But just to be clear: I would need to do the memory management myself, wouldn't I? Basically, it is a binary search tree; is there code in Sage (I am sure there is!) where such data structure is used and from where I can get inspiration?

Yes, you would have to do some memory management. I'm not sure exactly how much from my relatively quick look, but it should not be too bad.

tscrim commented 6 years ago
comment:13

Replying to @simon-king-jena:

Just to be sure: If I turn the !ETuple functions into methods, they should probably be cpdef, not cdef, right? As long as all instances of !ETuple in my code are cdefined, a cpdef method wouldn't be less efficient than a cdef method, right?

No, a cpdef called from a Cython file is just a C function call IIRC. So it should not be any less efficient from within Cython code (although I think it can introduce a bit of extra overhead when doing a Python function call compared to a plain def).

sage: %%cython
....: def foo():
....:     pass
....: cpdef cpfoo():
....:     pass
....: cdef cfoo():
....:     pass
....: def test():
....:     foo()
....: def ctest():
....:     cfoo()
....: def cptest():
....:     cpfoo()
....: cpdef cptestcp():
....:     cpfoo()
....:
sage: %timeit test()
10000000 loops, best of 3: 34.9 ns per loop
sage: %timeit ctest()
10000000 loops, best of 3: 22.9 ns per loop
sage: %timeit cptest()
10000000 loops, best of 3: 22.9 ns per loop
sage: %timeit cptestcp()
10000000 loops, best of 3: 23.3 ns per loop

(These timings are relatively consistent across multiple runs for me.)

simon-king-jena commented 6 years ago
comment:14

Just some questions to make sure I understand your suggestions:

Replying to @tscrim:

I missed that when going through the code. So then it is just better to use lists. Add lots of <ETuple>'s to avoid as much extra type checking as possible.

So, <ETuple> silences the type check, which is faster but can theoretically result in a segfault, right?

Id[-1] is a wrap-around: normal C would just do pointer offset. You can tell Cython to assume that indices passed in behave like C arrays and do not check if the result is negative (wrap-around check) or outside the list (bounds check).

So, when I do L[len(L)-1], Cython will automatically know that a bound check is not needed?

How to avoid a bound check in sum_from_list(list L, size_t s, size_t l), where I access L[0], L[1], L[s], L[s+l//2] etc.?

Hmm...curious. Although the compiler will almost certainly optimize those extra variables out anyways.

In that case I should change it.

It seems like they want to separate functions to me. Plus you could then require w to be a list for the signature.

I do require tuple. Would list be better?

Anyway, do you think that I should duplicate the loop in first_hilbert_series, once for the case that w is None and once for the case that it isn't?

Yes, you would have to do some memory management. I'm not sure exactly how much from my relatively quick look, but it should not be too bad.

Maybe go an easy middle way, and have some

cdef class Node:
    cdef Node Back, Left, Right
    cdef object LMult   # or some flint boilerplate instead of object
    cdef object LeftFHS # or some flint boilerplate instead of object

The overhead would be that Python objects will be allocated (unless I use a pool, similar to what Sage does with Integers...), but I guess the allocation is not more than the allocation of a dict, and moreover I wouldn't need to care about memory management. And accessing the cdef attributes of the node will be faster than looking up dictionary items, wouldn't it?

simon-king-jena commented 6 years ago
comment:15

PS, concerning "pool":

I recall that when I was young, there used to be a utility that automatically equipped a given cdef class with a pool/kill list, similar to what is hard coded for Integer. But I cannot find it now. Has it gone?

It may actually be an idea to equip !ETuple with pool/kill list. In that way, MPolynomial_polydict would probably suck slightly less, and the code here might also be a little faster.

simon-king-jena commented 6 years ago
comment:16

Replying to @simon-king-jena:

PS, concerning "pool":

I recall that when I was young, there used to be a utility that automatically equipped a given cdef class with a pool/kill list, similar to what is hard coded for Integer. But I cannot find it now. Has it gone?

Found it: sage.misc.allocator

simon-king-jena commented 6 years ago
comment:17

Replying to @simon-king-jena:

How to avoid a bound check in sum_from_list(list L, size_t s, size_t l), where I access L[0], L[1], L[s], L[s+l//2] etc.?

Since I am sure about bounds being correct, I guess I can use <ETuple>PyList_GET_ITEM(...).

tscrim commented 6 years ago
comment:18

Replying to @simon-king-jena:

Just some questions to make sure I understand your suggestions:

Replying to @tscrim:

I missed that when going through the code. So then it is just better to use lists. Add lots of <ETuple>'s to avoid as much extra type checking as possible.

So, <ETuple> silences the type check, which is faster but can theoretically result in a segfault, right?

Yes, that is correct. If you want to do an explicit check in there that can raise an error, you can do <Etuple?>. (I believe this is suppose to be different than letting Cython raise an error when not given the correct type, but I'm not sure about that, Jeroen?)

Id[-1] is a wrap-around: normal C would just do pointer offset. You can tell Cython to assume that indices passed in behave like C arrays and do not check if the result is negative (wrap-around check) or outside the list (bounds check).

So, when I do L[len(L)-1], Cython will automatically know that a bound check is not needed?

You have to also tell Cython not to do those checks. You can do this at the level of the file by adding # cython: boundscheck=False, wraparound=False as the first line(s) of the file (see, e.g., matrix/args.pyx) or a decorator @cython.boundscheck(False) and @cython.wraparound(False) on the appropriate functions (see, e.g., combinat/combinat_cython.pyx).

How to avoid a bound check in sum_from_list(list L, size_t s, size_t l), where I access L[0], L[1], L[s], L[s+l//2] etc.?

See above.

Hmm...curious. Although the compiler will almost certainly optimize those extra variables out anyways.

In that case I should change it.

I bet that will make Jeoren happy. :)

It seems like they want to separate functions to me. Plus you could then require w to be a list for the signature.

I do require tuple. Would list be better?

I couldn't remember and was guessing. tuple is perfectly fine.

Anyway, do you think that I should duplicate the loop in first_hilbert_series, once for the case that w is None and once for the case that it isn't?

I don't see why that one has to be split. It is only degree and quotient_degree that should be split IMO. For example, in HilbertBaseCase you would have

-        for m2 in Id:
-            if m is not m2:
-                Factor *= (1-t**quotient_degree(m2,m,w))
+        if w is None:
+            for m2 in Id:
+                if m is not m2:
+                    Factor *= (1-t**quotient_degree(m2,m))
+        else:
+            for m2 in Id:
+                if m is not m2:
+                    Factor *= (1-t**quotient_degree_graded(m2,m,w))

In some cases, this will result in fewer if checks (unless the compiler is optimizing those out). In others, this will be no worse than before.

Yes, you would have to do some memory management. I'm not sure exactly how much from my relatively quick look, but it should not be too bad.

Maybe go an easy middle way, and have some

cdef class Node:
    cdef Node Back, Left, Right
    cdef object LMult   # or some flint boilerplate instead of object
    cdef object LeftFHS # or some flint boilerplate instead of object

The overhead would be that Python objects will be allocated (unless I use a pool, similar to what Sage does with Integers...), but I guess the allocation is not more than the allocation of a dict, and moreover I wouldn't need to care about memory management. And accessing the cdef attributes of the node will be faster than looking up dictionary items, wouldn't it?

Yes, it should be much faster (at that scale) and require less memory than a dict (hash tables are generally at most 80% full IIRC, but you have to store both the key (which would be a string) and the value). I think a pool is overkill for this as you are building out a tree rather than frequently creating and removing nodes, right? The memory management really only comes into play when you start dealing with the flint objects directly.

Yes, you can use PyList_GET_ITEM, which I think is even a bit faster.

simon-king-jena commented 6 years ago
comment:19

Replying to @tscrim:

I think a pool is overkill for this as you are building out a tree rather than frequently creating and removing nodes, right? The memory management really only comes into play when you start dealing with the flint objects directly.

I wasn't talking about a pool for Nodes (which just build a tree), but about a pool for !ETuple, which I guess will be created and killed (during interreduction) more frequently.

tscrim commented 6 years ago
comment:20

Replying to @simon-king-jena:

Replying to @tscrim:

I think a pool is overkill for this as you are building out a tree rather than frequently creating and removing nodes, right? The memory management really only comes into play when you start dealing with the flint objects directly.

I wasn't talking about a pool for Nodes (which just build a tree), but about a pool for !ETuple, which I guess will be created and killed (during interreduction) more frequently.

Ah, I see. Yea, that might be good to do. Although the use of the pool could end up having a fair bit of overhead when not having a lot of ETuple's that would be GC'ed. I haven't actually looked though.

simon-king-jena commented 6 years ago
comment:21

It seems to me that really for i in range(s,e,b) is a bit slower than for i from s<=i<e by b. My main example becomes about 2% slower. See also this little benchmark:

sage: cython("""
....: def test1(size_t m):
....:     cdef size_t i,j
....:     j = 0
....:     for i in range(0,m,2):
....:         j += i
....:     return j
....: def test2(size_t m):
....:     cdef size_t i,j
....:     j = 0
....:     for i from 0<=i<m by 2:
....:         j += 1
....:     return j
....: """)
sage: %timeit test1(10000)
The slowest run took 4.12 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.18 µs per loop
sage: %timeit test1(10000)
100000 loops, best of 3: 3.1 µs per loop
sage: %timeit test2(10000)
The slowest run took 4.26 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.85 µs per loop
sage: %timeit test2(10000)
1000000 loops, best of 3: 1.86 µs per loop

So, if the loop itself is timed, one sees that the for ... from ... by is double the speed of for ... in range(...).

I guess I should use the deprecated syntax and post the above benchmark on cython-users, shouldn't I?

simon-king-jena commented 6 years ago
comment:22

Ouch. I had a type in my benchmark (j+=i versus j+=1). Without the typo, the timing of the test functions is basically equal. So, probably you are right that the additional assignments are optimized out by the compiler.

I don't know if the 2% difference in the real world example is significant.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from 827f579 to 7698bb6

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

7698bb6Some code optimizations in Hilbert series computation
simon-king-jena commented 6 years ago
comment:24

The last commit did some optimizations. What I want to try next:

  • Use flint boilerplate
  • Count the number of allocations and deallocations of !ETuple, and perhaps play with a pool.
simon-king-jena commented 6 years ago
comment:25

I wonder about

FLINT_DLL void fmpz_poly_mul(fmpz_poly_t res, 
                          const fmpz_poly_t poly1, const fmpz_poly_t poly2);

Is it legal that res coincides with poly1 or with poly2?

simon-king-jena commented 6 years ago
comment:26

Replying to @simon-king-jena:

I wonder about

FLINT_DLL void fmpz_poly_mul(fmpz_poly_t res, 
                          const fmpz_poly_t poly1, const fmpz_poly_t poly2);

Is it legal that res coincides with poly1 or with poly2?

Probably yes. In the sources, I found checks like

if (res == poly1 || res == poly2)...

which indicate that res==poly1 is allowed.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from 7698bb6 to 8ae070b

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

8ae070bUse flint polynomial boilerplate for Hilbert series computation
simon-king-jena commented 6 years ago
comment:28

In the latest commit, I am using boilerplate for flint polynomials. It brings a drastic improvement, apparently a factor of two compared with the previous commit, and there seems to be no memory leak:

sage: from sage.rings.polynomial.hilbert import first_hilbert_series, hilbert_poincare_series
sage: I,grading = load("/home/king/Projekte/coho/hilbert_example.sobj")
sage: get_memory_usage()
7366.8203125
sage: %timeit res = hilbert_poincare_series(I,grading)
10 loops, best of 3: 144 ms per loop
sage: get_memory_usage()
7438.82421875
sage: %timeit res = hilbert_poincare_series(I,grading)
10 loops, best of 3: 145 ms per loop
sage: get_memory_usage()
7438.82421875
sage: %timeit res = hilbert_poincare_series(I,grading)
10 loops, best of 3: 144 ms per loop
sage: get_memory_usage()
7438.82421875
sage: %timeit res = hilbert_poincare_series(I,grading)
10 loops, best of 3: 144 ms per loop
sage: get_memory_usage()
7438.82421875

It might be interesting to compare with Singular in examples that Singular can manage:

sage: R = PolynomialRing(QQ,'x',25)
sage: M = matrix(R,5,R.gens())
sage: I = R*[m.lm() for m in (R*(M^2).list()).groebner_basis()]
sage: I.hilbert_numerator()
-t^30 + 24*t^29 - 275*t^28 + 2000*t^27 - 10350*t^26 + 40480*t^25 - 123971*t^24 + 303600*t^23 - 600774*t^22 + 960600*t^21 - 1222811*t^20 + 1182944*t^19 - 740749*t^18 + 18472*t^17 + 687275*t^16 - 1105688*t^15 + 1161279*t^14 - 963728*t^13 + 671803*t^12 - 393000*t^11 + 179220*t^10 - 51176*t^9 + 503*t^8 + 6024*t^7 - 1400*t^6 - 576*t^5 + 275*t^4 + 24*t^3 - 25*t^2 + 1
sage: first_hilbert_series(I)
-t^30 + 24*t^29 - 275*t^28 + 2000*t^27 - 10350*t^26 + 40480*t^25 - 123971*t^24 + 303600*t^23 - 600774*t^22 + 960600*t^21 - 1222811*t^20 + 1182944*t^19 - 740749*t^18 + 18472*t^17 + 687275*t^16 - 1105688*t^15 + 1161279*t^14 - 963728*t^13 + 671803*t^12 - 393000*t^11 + 179220*t^10 - 51176*t^9 + 503*t^8 + 6024*t^7 - 1400*t^6 - 576*t^5 + 275*t^4 + 24*t^3 - 25*t^2 + 1

so, the result is still correct. However, comparing with Singular-via-pexpect, I get

sage: %timeit first_hilbert_series(I)
10 loops, best of 3: 120 ms per loop

versus

sage: Is = singular(I)
sage: singular.eval('attrib({},"isSB",1)'.format(Is.name()))
''
sage: %timeit singular.eval(hv.name()+'=hilb({},1)'.format(Is.name()))
10 loops, best of 3: 163 ms per loop

Conclusion: The new implementation not only avoids the 32 bit limitation of Singular but seems to be faster than Singular. So, meanwhile the implementation is competitive.

Next, I will see if a pool for !ETuple makes sense, and in any case I will move some cdef functions to cpdef methods of !ETuple.

simon-king-jena commented 6 years ago
comment:29

I just tested: In the main example (the one in the attachment to this ticket), there are 93382 allocations and the same number of deallocations of !ETuples. It sounds much, however it doesn't contribute much time:

sage: cython("""
....: from sage.rings.polynomial.polydict cimport ETuple
....: def test():
....:     cdef ETuple m
....:     cdef int i
....:     for i in range(93000):
....:         m = ETuple.__new__(ETuple)
....: """
....: )
sage: %timeit test()
100 loops, best of 3: 3.03 ms per loop

So, the prospect of saving less than 3ms in an example that takes 145 ms isn't enough to justify the implementation of a pool. Do you agree?

tscrim commented 6 years ago
comment:30

You probably are not doing enough creations and deletions with the same amount in memory. Probably the bigger time spent is in allocating the data list in the ETuple, which cannot reasonably1 be put in a pool since those lengths almost certainly change each (re)allocation. So you are not really gaining much from that. Perhaps it is worthwhile to try on a bigger example, say something that takes 1-2 seconds and see if you still get a 2% savings or if there is just little change.

1 Of course, it is possible, but I am pretty sure you have to write your own memory manager.

tscrim commented 6 years ago
comment:31

One other general comment: IMO it is still useful to add a little bit of documentation via a docstring explaining what functions do even when they are cdef (you do not need to doctest them however).

tscrim commented 6 years ago
comment:32

Just a general question: do you this is would be easy and worthwhile to pull the main Hilbert series computation code out into a standalone Cython package library (that would depend only on flint)?

One last comment: Please remove this global definition:

# Global definition
PR = PolynomialRing(ZZ,'t')
t = PR('t')

I see no reason to nail it in memory.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from 8ae070b to d997abe

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

d997abeFix a corner case in Hilbert series computation
simon-king-jena commented 6 years ago
comment:34

Replying to @tscrim:

One other general comment: IMO it is still useful to add a little bit of documentation via a docstring explaining what functions do even when they are cdef (you do not need to doctest them however).

Yes, documentation should be added. And another test, for the corner case that is fixed in the previous commit.

simon-king-jena commented 6 years ago
comment:35

Replying to @tscrim:

Just a general question: do you this is would be easy and worthwhile to pull the main Hilbert series computation code out into a standalone Cython package library (that would depend only on flint)?

Not totally sure. What I need is !ETuple, which is defined in Sage. So, unless !ETuple is pulled out as well, it makes no sense. Apart from !ETuple though, I think it would make sense: If I see that correctly, the only place where I use more than flint is in hilbert_poincare_series, in which I currently just do a division of two flint polynomials in Sage.

One last comment: Please remove this global definition:

# Global definition
PR = PolynomialRing(ZZ,'t')
t = PR('t')

I see no reason to nail it in memory.

Done in the previous commit.

simon-king-jena commented 6 years ago
comment:36

Ouch. The previous commit was supposed to fix the corner case of the zero ideal, but I did a mistake so that now the corner case of the one ideal became wrong. Will be fixed next, together with more documentation.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from d997abe to 5af3439

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

5af3439Fix the erroneous fix of some corner cases
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from 5af3439 to 138f85b

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

138f85bTurn some functions of polynomial.hilbert to methods of ETuple
simon-king-jena commented 6 years ago
comment:39

The latest commit turns some cdef functions into cpdef methods.

The effect is not good, the computation time dropped from about 145ms to about 190ms:

sage: %time hilbert_poincare_series(I,grading)
CPU times: user 192 ms, sys: 0 ns, total: 192 ms
Wall time: 191 ms
(-8*t^26 + 24*t^25 - 16*t^24 - 16*t^23 + 24*t^22 - 8*t^21 - t^7 - t^6 + t^5 - t^4 - t^3 - t^2 + t - 1)/(t^13 - 3*t^12 + 2*t^11 + 2*t^10 - 3*t^9 + t^8 - t^5 + 3*t^4 - 2*t^3 - 2*t^2 + 3*t - 1)
sage: %crun hilbert_poincare_series(I,grading)
PROFILE: interrupts/evictions/bytes = 20/0/8288
Using local file /home/king/Sage/git/sage/local/bin/python2.
Using local file /home/king/.sage/temp/klap/5581/tmp_fZKweM.perf.
Total: 20 samples
       0   0.0%   0.0%       19  95.0% PyEval_EvalCode
       0   0.0%   0.0%       19  95.0% PyEval_EvalCodeEx
       0   0.0%   0.0%       19  95.0% PyEval_EvalFrameEx
       0   0.0%   0.0%       19  95.0% PyObject_Call
       0   0.0%   0.0%       19  95.0% PyRun_FileExFlags
       0   0.0%   0.0%       19  95.0% PyRun_SimpleFileExFlags
       0   0.0%   0.0%       19  95.0% PyRun_StringFlags
       0   0.0%   0.0%       19  95.0% Py_Main
       0   0.0%   0.0%       19  95.0% __Pyx_PyObject_Call
       0   0.0%   0.0%       19  95.0% __libc_start_main
       0   0.0%   0.0%       19  95.0% __pyx_pf_4sage_5rings_10polynomial_7hilbert_2hilbert_poincare_series (inline)
       0   0.0%   0.0%       19  95.0% __pyx_pf_4sage_5rings_10polynomial_7hilbert_first_hilbert_series (inline)
       0   0.0%   0.0%       19  95.0% __pyx_pw_4sage_5rings_10polynomial_7hilbert_1first_hilbert_series
       0   0.0%   0.0%       19  95.0% __pyx_pw_4sage_5rings_10polynomial_7hilbert_3hilbert_poincare_series
       0   0.0%   0.0%       19  95.0% _start
       0   0.0%   0.0%       19  95.0% call_function (inline)
       0   0.0%   0.0%       19  95.0% exec_statement (inline)
       0   0.0%   0.0%       19  95.0% ext_do_call (inline)
       0   0.0%   0.0%       19  95.0% fast_function (inline)
       0   0.0%   0.0%       19  95.0% function_call
       0   0.0%   0.0%       19  95.0% run_mod (inline)
       0   0.0%   0.0%       11  55.0% __pyx_f_4sage_5rings_10polynomial_7hilbert_make_children (inline)
       0   0.0%   0.0%        9  45.0% __pyx_f_4sage_5rings_10polynomial_7hilbert_interred
       0   0.0%   0.0%        8  40.0% __pyx_f_4sage_5rings_10polynomial_7hilbert_indivisible_in_list (inline)
       8  40.0%  40.0%        8  40.0% __pyx_f_4sage_5rings_10polynomial_8polydict_6ETuple_divides
       0   0.0%  40.0%        5  25.0% __pyx_f_4sage_5rings_10polynomial_7hilbert_quotient_by_var (inline)
       0   0.0%  40.0%        4  20.0% __pyx_f_4sage_5rings_10polynomial_7hilbert_HilbertBaseCase.isra.41
       0   0.0%  40.0%        3  15.0% PyObject_RichCompare
       1   5.0%  45.0%        3  15.0% __pyx_pw_4sage_5rings_10polynomial_8polydict_6ETuple_11__getitem__
       0   0.0%  45.0%        3  15.0% __pyx_sq_item_4sage_5rings_10polynomial_8polydict_ETuple
       0   0.0%  45.0%        3  15.0% fmpz_poly_mul
       0   0.0%  45.0%        2  10.0% __Pyx_PyObject_Call (inline)
       0   0.0%  45.0%        2  10.0% __Pyx_PyObject_CallOneArg
       0   0.0%  45.0%        2  10.0% __pyx_pf_4sage_5rings_10polynomial_8polydict_6ETuple_10__getitem__ (inline)
       1   5.0%  50.0%        2  10.0% _fmpz_poly_mul_tiny1
       2  10.0%  60.0%        2  10.0% convert_3way_to_object (inline)
       0   0.0%  60.0%        2  10.0% do_richcmp (inline)
       0   0.0%  60.0%        2  10.0% try_3way_to_rich_compare (inline)
       0   0.0%  60.0%        1   5.0% 0x00007fe83c01f837
       1   5.0%  65.0%        1   5.0% PyInt_FromLong
       0   0.0%  65.0%        1   5.0% PyNumber_Remainder
       0   0.0%  65.0%        1   5.0% PyObject_RichCompareBool
       1   5.0%  70.0%        1   5.0% _PyObject_New
       0   0.0%  70.0%        1   5.0% __Pyx_PyFunction_FastCallDict.constprop.73
       0   0.0%  70.0%        1   5.0% __Pyx_PyFunction_FastCallNoKw (inline)
       0   0.0%  70.0%        1   5.0% __Pyx_PyInt_As_size_t.part.41
       0   0.0%  70.0%        1   5.0% __Pyx_PyInt_From_int (inline)
       0   0.0%  70.0%        1   5.0% __Pyx_PyNumber_IntOrLong (inline)
       1   5.0%  75.0%        1   5.0% __Pyx_TypeTest (inline)
       0   0.0%  75.0%        1   5.0% __Pyx__PyObject_CallOneArg
       0   0.0%  75.0%        1   5.0% __libc_calloc
       0   0.0%  75.0%        1   5.0% __pyx_f_4sage_5rings_10polynomial_8polydict_6ETuple_weighted_degree
       0   0.0%  75.0%        1   5.0% __pyx_pf_4sage_5rings_10polynomial_8polydict_6ETuple_18__richcmp__ (inline)
       1   5.0%  80.0%        1   5.0% __pyx_pf_4sage_5rings_7integer_7Integer_122__int__ (inline)
       0   0.0%  80.0%        1   5.0% __pyx_pw_4sage_5rings_10polynomial_8polydict_6ETuple_19__richcmp__
       0   0.0%  80.0%        1   5.0% __pyx_pw_4sage_5rings_7integer_7Integer_123__int__
       1   5.0%  85.0%        1   5.0% _fmpz_vec_zero
       1   5.0%  90.0%        1   5.0% _init
       1   5.0%  95.0%        1   5.0% _int_malloc
       0   0.0%  95.0%        1   5.0% binary_op (inline)
       1   5.0% 100.0%        1   5.0% binary_op1 (inline)
       0   0.0% 100.0%        1   5.0% build_sortwrapper (inline)
       0   0.0% 100.0%        1   5.0% flint_calloc
       0   0.0% 100.0%        1   5.0% fmpz_poly_init2
       0   0.0% 100.0%        1   5.0% listindex
       0   0.0% 100.0%        1   5.0% listsort

What is convert_3way_to_object?

Do you see why the computation became A LOT slower? Some of the functions that I moved from hilbert to ETuple-methods use to be "cdef inline", but are now "cpdef". Is it possible that the removal of "inline" was bad?

simon-king-jena commented 6 years ago
comment:40

Also note that there is a compiler warning. If I understand correctly, it comes from the line

L.sort(key=ETuple.unweighted_degree)

How else should I force sorting by unweighted degree?

simon-king-jena commented 6 years ago
comment:41

PS: Of course I should add documentation. But first I want to know why the code became so much slower.

simon-king-jena commented 6 years ago
comment:42

I was wondering if perhaps the slowness came from NOT defining the parent of the first Hilbert series globally:

sage: cython("""
....: from sage.all import ZZ
....: P = ZZ['t']
....: t = P.gen()
....: def test1(i):
....:     return t**i
....: """)
sage: cython("""
....: def test2(i):
....:     from sage.all import ZZ
....:     P = ZZ['t']
....:     t = P.gen()
....:     return t**i
....: """)
sage: %timeit test1(50)
The slowest run took 29.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.83 µs per loop
sage: %timeit test2(50)
The slowest run took 4.76 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 42.2 µs per loop

So, that's a lot. But the slow-down can be reduced:

sage: cython("""
....: def test3(i):
....:     from sage.all import ZZ,PolynomialRing
....:     P = PolynomialRing(ZZ,'t')
....:     t = P.gen()
....:     return t**i
....: """)
sage: %timeit test3(50)
The slowest run took 4.88 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 24.6 µs per loop

Perhaps I should change my code accordingly.

simon-king-jena commented 6 years ago
comment:43

Without a change:

sage: from sage.rings.polynomial.hilbert import first_hilbert_series, hilbert_poincare_series
sage: I,grading = load("/home/king/Projekte/coho/hilbert_example.sobj")
sage: %timeit first_hilbert_series(I,grading)
1 loop, best of 3: 177 ms per loop
sage: %timeit hilbert_poincare_series(I,grading)
1 loop, best of 3: 176 ms per loop
sage: %timeit first_hilbert_series(I,grading)
10 loops, best of 3: 176 ms per loop
sage: %timeit hilbert_poincare_series(I,grading)
1 loop, best of 3: 179 ms per loop

With PolynomialRing(ZZ,'t') instead of ZZ['t']:

sage: %timeit first_hilbert_series(I,grading)
10 loops, best of 3: 171 ms per loop
sage: %timeit hilbert_poincare_series(I,grading)
10 loops, best of 3: 173 ms per loop
sage: %timeit first_hilbert_series(I,grading)
10 loops, best of 3: 172 ms per loop
sage: %timeit hilbert_poincare_series(I,grading)
1 loop, best of 3: 172 ms per loop

So, it really doesn't help much. And globally defining the polynomial ring doesn't help either.

Hence, I believe the slowness comes from the cpdef methods or the not-inlining.

simon-king-jena commented 6 years ago
comment:44

I know that if the compiler knows that we are dealing with an ETuple that a cpdef method should have the same speed as a cdef function. But in fact it has not. With cdef methods:

sage: I,grading = load("/home/king/Projekte/coho/hilbert_example.sobj")
sage: %time hilbert_poincare_series(I,grading)
CPU times: user 168 ms, sys: 43 µs, total: 168 ms
Wall time: 168 ms
(8*t^26 - 24*t^25 + 16*t^24 + 16*t^23 - 24*t^22 + 8*t^21 + t^7 + t^6 - t^5 + t^4 + t^3 + t^2 - t + 1)/(-t^13 + 3*t^12 - 2*t^11 - 2*t^10 + 3*t^9 - t^8 + t^5 - 3*t^4 + 2*t^3 + 2*t^2 - 3*t + 1)
sage: %timeit first_hilbert_series(I)
10 loops, best of 3: 111 ms per loop
sage: %timeit first_hilbert_series(I,grading)
10 loops, best of 3: 158 ms per loop

With cpdef methods

sage: from sage.rings.polynomial.hilbert import first_hilbert_series, hilbert_poincare_series
sage: I,grading = load("/home/king/Projekte/coho/hilbert_example.sobj")
sage: %time hilbert_poincare_series(I,grading)
CPU times: user 178 ms, sys: 7 µs, total: 178 ms
Wall time: 177 ms
(8*t^26 - 24*t^25 + 16*t^24 + 16*t^23 - 24*t^22 + 8*t^21 + t^7 + t^6 - t^5 + t^4 + t^3 + t^2 - t + 1)/(-t^13 + 3*t^12 - 2*t^11 - 2*t^10 + 3*t^9 - t^8 + t^5 - 3*t^4 + 2*t^3 + 2*t^2 - 3*t + 1)
sage: %timeit first_hilbert_series(I)
10 loops, best of 3: 128 ms per loop
sage: %timeit first_hilbert_series(I,grading)
1 loop, best of 3: 171 ms per loop

So, we have a slow-down of about 11-15%. Therefore I'd prefer to use cdef methods, where possible (i.e., the only exception is unweighted_degree, which I use for sorting).

Next plan: Try to test w is None less often, by a moderate code duplication.