sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 481 forks source link

Performance regression in span() for vectors over GF(2) #23746

Open 6b8a00d1-0beb-4653-a6ee-6a3405149463 opened 7 years ago

6b8a00d1-0beb-4653-a6ee-6a3405149463 commented 7 years ago

Calculating the span of a large number of vectors in a high-dimension vector space over GF(2) has gotten a lot slower in recent versions of sage. This can be demonstrated by the following code:

import time

set_random_seed(0)
n = 1000
v = [ random_vector(GF(2), n) for i in xrange(n) ]

c1 = time.clock()
span(v, GF(2))
print time.clock()-c1

In Sage 7.4, this took around ~0.7s (on my laptop), whereas in Sage 8.0, it takes ~12s. This performance regression was introduced by #21746. The reason is that the new logic in Vector_mod2_dense.__init__() ends up calculating the modulus of an IntegerMod_int object rather than a native C int, which turns out to be a lot slower. Profiling the above code reveals that in Sage 8.0 it indeed spends most of its time in IntegerMod_int.__mod__.

CC: @simon-king-jena @jdemeyer

Component: performance

Keywords: span, Vector_mod2_dense, IntegerMod_int

Branch/Commit: u/dominic.else/performance_regression_in_span___for_vectors_over_gf2 @ 403ece1

Issue created by migration from https://trac.sagemath.org/ticket/23746

6b8a00d1-0beb-4653-a6ee-6a3405149463 commented 7 years ago

Branch: u/dominic.else/performance_regression_in_span___for_vectors_over_gf2

6b8a00d1-0beb-4653-a6ee-6a3405149463 commented 7 years ago
comment:2

The fix I uploaded sidesteps the problem by converting an IntegerMod_int object to a Python int first before taking the modulus, restoring the original performance. A better solution might be to make IntegerMod_int.__mod__() faster.


New commits:

403ece1Fix performance regression introduced by #21746
6b8a00d1-0beb-4653-a6ee-6a3405149463 commented 7 years ago

Commit: 403ece1

6b8a00d1-0beb-4653-a6ee-6a3405149463 commented 7 years ago

Changed keywords from span, Vector_mod2_dense to span, Vector_mod2_dense, IntegerMod_int

simon-king-jena commented 7 years ago
comment:5
    def __mod__(self, modulus):
        if not isinstance(self, IntegerMod_abstract):
            # something % Mod(x,y) makes no sense
            return NotImplemented
        from .integer_mod_ring import IntegerModRing
        R = IntegerModRing(modulus)
        if (<Element>self)._parent._IntegerModRing_generic__order % R.order():
            raise ArithmeticError(f"reduction modulo {modulus!r} not defined")
        return R(self)

The import should not be done inside of the method. Is R.order() not the same as modulus? But in any case, I do think that using the modulus of a native C int should be used.

simon-king-jena commented 7 years ago
comment:6

Note that the semantic is different:

sage: x = GF(3)(2)
sage: type(x)
<type 'sage.rings.finite_rings.integer_mod.IntegerMod_int'>
sage: x
2
sage: x%2
---------------------------------------------------------------------------
ArithmeticError                           Traceback (most recent call last)
<ipython-input-4-d92b2f50ba44> in <module>()
----> 1 x%Integer(2)

/home/king/Sage/git/sage/src/sage/rings/finite_rings/integer_mod.pyx in sage.rings.finite_rings.integer_mod.IntegerMod_abstract.__mod__ (build/cythonized/sage/rings/finite_rings/integer_mod.c:6386)()
    392         R = IntegerModRing(modulus)
    393         if (<Element>self)._parent._IntegerModRing_generic__order % R.order():
--> 394             raise ArithmeticError(f"reduction modulo {modulus!r} not defined")
    395         return R(self)
    396 

ArithmeticError: reduction modulo 2 not defined
sage: int(x)%2
0

Note that the error does not depend on the specific element x, but only depends on the parent of x. Is it guaranteed in the vector code that all elements belong to the same parent? If they do, then the check on the parent should of course be done only once per vector, not once per element.

tscrim commented 7 years ago
comment:7

They should all belong to the same parent.