sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.3k stars 447 forks source link

Cythoned homsets #14214

Open simon-king-jena opened 11 years ago

simon-king-jena commented 11 years ago

Homsets arise whenever a coercion map or a conversion map is involved. Hence, homsets are ubiquitous in Sage and should thus be as a fast as possible.

Therefore I suggest change sage.categories.homset into a cython module. A clean solution seems out of reach at the moment (see comments), and so I just provide a couple of speed-ups here.

Apply

Depends on #14279 Depends on #18758

CC: @nthiery

Component: performance

Keywords: Hom, cached method

Author: Simon King

Issue created by migration from https://trac.sagemath.org/ticket/14214

simon-king-jena commented 11 years ago

Attachment: trac_14214-cython_homset.patch.gz

simon-king-jena commented 11 years ago
comment:45

Apply trac_14214-cython_homset.patch trac_14214-backup_cache.patch

simon-king-jena commented 11 years ago

Changed work issues from Rebase; how to store domain and codomain? to How to store domain and codomain?

simon-king-jena commented 11 years ago
comment:47

Both patches are updated, I guess we can now focus on the remaining issue with properties.

I asked whether it'd make sense to cythonize lazy_attribute. I think it does not, because we want to override __name__ of an instance which won't work in a cdef class.

In addition, it would be bad for speed if the property needed to first look up a name and then try to find this name as a key of __cached_methods. Would it be acceptable to have two specialised non-data descriptors with fixed names _domain and _codomain, rather than implementing a general non-data descriptor that can take any name?

nbruin commented 11 years ago
comment:48

Replying to @simon-king-jena:

Both patches are updated, I guess we can now focus on the remaining issue with properties.

I asked whether it'd make sense to cythonize lazy_attribute. I think it does not, because we want to override __name__ of an instance which won't work in a cdef class.

Do you mean for documentation and code introspection purposes? I guess that might be an issue. Shouldn't you just have a fast cythoned lazy_attribute for when performance matters (provided that making such a thing suitable for general consumption doesn't slow it down too much -- otherwise people should just be rolling their own when they need one) and a python version (would a normal (non-cdeffed) class in cython do?) that has all the documentation, introspecion bells and whistles. You may still get a benefit if you inherit from the fast implementation?

In addition, it would be bad for speed if the property needed to first look up a name and then try to find this name as a key of __cached_methods.

I don't think it's really measurable. I added a descriptor that avoids the lookup:

cdef class SpecializedCachedAttributeType(object):
    def __init__(self, name):
        if name !="_codomain":
            raise ValueError("I don't do that sort of thing")
    def __get__(self,instance,cls):
        try:
            return instance.__cached_methods["_codomain"]
        except KeyError:
            raise AttributeError("%s instance has no attribute '%s'"%(cls,"_codomain"))

and changed the initialization to

    _domain=CachedAttributeType("_domain")
    _codomain=SpecializedCachedAttributeType("_codomain")

I'm getting:

sage: %timeit B._codomain
10000000 loops, best of 3: 82.9 ns per loop
sage: %timeit B._domain
10000000 loops, best of 3: 84 ns per loop

which is close enough together that it's hard to call the difference significant (with the old cython I'm getting 114 ns for each). If you store that name on a cdef attribute, the lookup is pretty fast.

Would it be acceptable to have two specialised non-data descriptors with fixed names _domain and _codomain, rather than implementing a general non-data descriptor that can take any name?

I don't think it's worth it. You'll gain much more if you allow HomType instances to have a __dict__ (although a drawback there is that all cpdef methods will slow down; they'd need to be split in a cdef and a def again, or cython should grow a way of calling cpdef methods "without instance dict attribute override check", for which the infrastructure is already available and is used internally in cython for the python wrapper of the cpdef method)

nbruin commented 11 years ago
comment:49

Replying to @nbruin:

I asked whether it'd make sense to cythonize lazy_attribute. I think it does not, because we want to override __name__ of an instance which won't work in a cdef class.

Actually, I think this is of very limited value anyway. You have to work very hard to get your hands on a descriptor, let alone its __name__, because its very nature makes that most attempts will result in executing the descriptor and getting its result. You really have to go digging into dictionaries to get the descriptor object itself. By the time you have succeeded at that, you can probably get at attributes on the object too, so having the "name" won't be of huge value.

The main disadvantage of descriptors versus no-arg methods is that it's hard put documentation on them that's accessible. The big advantage is of course that they're faster.

EDIT: after looking at the current implementation of lazy_attribute I see what's happening:

    def __get__(self,instance,cls):
        if instance is None: #this is when the attribute gets requested on the class itself
            return self
        ...

so that indeed gives a relatively easy way to get the descriptor object itself. Do people actually use that? If so, it would indeed be reasonable to make the documentation available on it. If it's not possible to put it on the cythonized lazy_attribute (have you tried just putting slots on there with the appropriate names and values?), you could just return a custom made wrapper that does have the documentation at that point, i.e.,

            return WrapperWithDocumentation(self)

where WrapperWithDocumentation can extract the required documentation from self.f and put it on itself.

So cythonizing lazy_attribute is definitely possible and it should lose quite a bit of overhead, because the function can be stored in a cython slot (so hardly any further lookup overhead on that, just the call). The question is of course if this is ever a bottleneck ... I suspect that giving parents a __dict__ for shadowing attributes (and hence largely making __cached_methods redundant) will be a much greater win.

nbruin commented 11 years ago
comment:50

A very rough implementation:

from sage.misc.lazy_attribute import lazy_attribute

cdef class LazyAttribute:
    cdef public f
    cdef public str __name__
    def __init__(self,f):
        self.f=f
        self.__name__=f.__name__
    def __get__(self,instance,cls):
        if instance is None:
            return lazy_attribute(self.f)
        v=self.f(instance)
        setattr(instance,self.__name__,v)
        return v

Note I'm just reusing the old lazy_attribute here to deal with documentation issues.

sage: class T(object):
...       @LazyAttribute
...       def t(self):
...           """docstring of f"""
...           return 1000
...       @lazy_attribute
...       def s(self):
...           """docstring of f"""
...           return 1000
sage: timeit("T().t",repeat=10,number=10000)
10000 loops, best of 10: 374 ns per loop
sage: timeit("T().s",repeat=10,number=10000)
10000 loops, best of 10: 2.93 µs per loop

If I just copy over the __get__ method from the original lazy_attribute the difference is a little less pronounced, because that code does a lot more. After changing self.f.__name__ to self.__name__ I'm getting 1.2 µs, so that still more than twice as fast.

simon-king-jena commented 11 years ago
comment:51

Replying to @nbruin:

A very rough implementation:

from sage.misc.lazy_attribute import lazy_attribute

cdef class LazyAttribute:
    cdef public f
    cdef public str __name__
    def __init__(self,f):
        self.f=f
        self.__name__=f.__name__
    def __get__(self,instance,cls):
        if instance is None:
            return lazy_attribute(self.f)
        v=self.f(instance)
        setattr(instance,self.__name__,v)
        return v

If I understand correctly, you optimize the case that the instance allows attribute assignment. But I think that won't be relevant in our use case: If there is attribute assignment (let's call it "case 1"), then we will use it during __init__, and hence an existing non-data descriptor will be overridden.

So, the choice of an implementation of the non-data descriptor will only matter if there is no attribute assignment available (let's call it "case 2"). And if I am not mistaken, your code will be very slow in case 2, and certainly not faster than a @lazy_attribute.

I think the questions to be addressed here are these:

If I understand correctly, using Cython's "property" would be faster than a @lazy_attribute in case 2, but it would make it impossible to use regular attributes in case 1, whence a massive slow-down in case 1 (which is not acceptable, as case 1 is the normal case).

But can we benefit from a Cython implementation of lazy_attribute so that it becomes faster in case 2 but still results in a non-data descriptor that can be overridden by an attribute in case 1?

nbruin commented 11 years ago
comment:52

I guess our discussion got side-tracked a little bit:

Then there's the second bit, cythonization of lazy_attribute:

That actually means you COULD put a cythonized lazy_attribute on _domain and _codomain. The overhead compared to a custom nondatadescriptor is minimal, and within a factor 2 of a dict lookup. Here's the code I mean:

cdef class LazyAttribute:
    cdef public f
    cdef public str __name__
    def __init__(self,f):
        self.f=f
        self.__name__=f.__name__
    def __get__(self,a,cls):
        cdef CM
        cdef result
        if a is None:
            return lazy_attribute(self.f)
        try:
            # __cached_methods is supposed to be a public Cython attribute.
            # Apparently, these are *not* subject to name mangling.
            CM = getattr(a, '__cached_methods')
            if CM is None:
                CM = {}
                setattr(a, '__cached_methods', CM)
        except AttributeError,msg:
            CM = None
        if CM is not None:
            try:
                return CM[self.__name__]
            except KeyError:
                pass
        result = self.f(a)
        if result is NotImplemented:
            # Workaround: we make sure that cls is the class
            # where the lazy attribute self is actually defined.
            # This avoids running into an infinite loop
            # See About descriptor specifications
            for supercls in cls.__mro__:
                if self.f.__name__ in supercls.__dict__ and self is supercls.__dict__[self.__name__]:
                    cls = supercls
            return getattr(super(cls, a),self.__name__)
        try:
            setattr(a, self.__name__, result)
        except AttributeError:
            if CM is not None:
                CM[self.f.__name__] = result
                return result
            raise
        return result

There's a bit of clean-up to do on this concerning returning a documented object when requested on the bare class. It works now, but only by reusing the old implementation. Is returning the descriptor object itself on the bare class customary/desirable behaviour?

nbruin commented 11 years ago
comment:53

see #14615 for cythonized lazy_attribute. Its costs seem quite reasonable.

nbruin commented 11 years ago
comment:54

Ping!. The present patches here pass doctests on the bot!

I think conceptually what you want in this case is really just a nondata descriptor that acts as a backstopper to provide (essentially) a read-only attribute stored in __cached_methods if no instance dictionary is available (or more accurately: if it isn't used) The code at [comment:40] provides that.

cdef class CachedAttributeType(object):
    cdef public str name
    def __init__(self, name):
        self.name = name
    def __get__(self,instance,cls):
        cdef dict CM
        try:
            CM = getattr(instance,"__cached_methods")
            return CM[self.name]
        except TypeError, KeyError:
            raise AttributeError("%s instance has no attribute '%s'"%(cls,self.name))

Since the __get__ here is very close the one of the faster code paths in a properly cythonized lazy_attribute (see #14615), defining the backed attribute either way:

  _domain=CachedAttributeType("_domain")

  @lazy_attribute
  def _codomain(self);
      raise AttributeError("You should have set me already!")

is going to give the same performance. You may be able to shave off a few nanoseconds by replacing some of the auto-generated cython code with explicit Python C-API calls. I suspect we'll have to live with many parents not having a __dict__ for the foreseeable future.

(note that any slowdowns due to deeper MRO will go away once the cython upgrade is in).

simon-king-jena commented 11 years ago
comment:55

Replying to @nbruin:

Ping!. The present patches here pass doctests on the bot!

I think conceptually what you want in this case is really just a nondata descriptor that acts as a backstopper to provide (essentially) a read-only attribute stored in __cached_methods if no instance dictionary is available (or more accurately: if it isn't used) The code at [comment:40] provides that.

Agreed.

The first question is: Where shall we put the code?

cdef class CachedAttributeType(object):
    cdef public str name
    def __init__(self, name):
        self.name = name
    def __get__(self,instance,cls):
        cdef dict CM
        try:
            CM = getattr(instance,"__cached_methods")
            return CM[self.name]
        except TypeError, KeyError:
            raise AttributeError("%s instance has no attribute '%s'"%(cls,self.name))

I guess you either mean getattr(instance, "__cached_methods", None) or except (AttributeError,KeyError): (in any case, it should be with brackets, otherwise it will not catch the KeyError).

And I guess you could get it still a bit faster, if you do def __get__(self, Parent instance, cls), because our homsets are parents, and because the cdef public attribute __cached_methods could be accessed more quickly if the instance is known to be a Parent.

Since the __get__ here is very close the one of the faster code paths in a properly cythonized lazy_attribute (see #14615), defining the backed attribute either way:

  _domain=CachedAttributeType("_domain")

  @lazy_attribute
  def _codomain(self);
      raise AttributeError("You should have set me already!")

is going to give the same performance.

And that's the second question: Are you really sure that lazy_attribute will be as fast as CachedAttributeType in the case of no __dict__?

Note that I tested to cythonize lazy_attribute and to cimport Parent. It failed (circular import or so). So, we come back to the first question: Where shall we put the code? I think it makes sense to cdefine instance being Parent, but this would be impossible in sage/misc/lazy_attribute.pyx.

And the third question is: Shall we really make this ticket depend on a proper cythonized version of lazy_attribute? Note that currently the lazy attribute or cached attribute would never come into play, because all our homsets are Python subclasses of the (now cythonized) Homset.

nbruin commented 11 years ago
comment:56

Replying to @simon-king-jena:

And I guess you could get it still a bit faster, if you do def __get__(self, Parent instance, cls), because our homsets are parents, and because the cdef public attribute __cached_methods could be accessed more quickly if the instance is known to be a Parent.

Excellent idea! if __cached_method isn't in a slot then we wouldn't want to use it anyway. It does mean that we lose flexibility: If for some reason, we have a class that can't inherit directly from Parent but does want to cooperate in this protocol, it won't be able to.

And that's the second question: Are you really sure that lazy_attribute will be as fast as CachedAttributeType in the case of no __dict__?

My timings suggested this. However, I suspect that providing cdef access to __cached_method will be a (small) measurable gain, so with that specialization you should be faster.

Note that I tested to cythonize lazy_attribute and to cimport Parent. It failed (circular import or so).

That can't be lazy_attribute, or at least not my version. It doesn't have any imports!

So, we come back to the first question: Where shall we put the code? I think it makes sense to cdefine instance being Parent, but this would be impossible in sage/misc/lazy_attribute.pyx.

If it's a parent-specific support class it can just go with the parent source, right?

And the third question is: Shall we really make this ticket depend on a proper cythonized version of lazy_attribute?

Not if you go the Parent-specific CachedAttribute route. That would be a bad name, by the way. Perhaps ParentAttributeBacker is better.

all our homsets are Python subclasses of the (now cythonized) Homset.

Talk about academic code then ... I'm sure we'll get cythonized homset instances at some point.

simon-king-jena commented 11 years ago
comment:57

Replying to @nbruin:

Note that I tested to cythonize lazy_attribute and to cimport Parent. It failed (circular import or so).

That can't be lazy_attribute, or at least not my version. It doesn't have any imports!

sage.structure.parent imports lazy_attribute, and (to my surprise, I must say) cimporting Parent in sage.misc.lazy_attribute did crash.

So, we come back to the first question: Where shall we put the code? I think it makes sense to cdefine instance being Parent, but this would be impossible in sage/misc/lazy_attribute.pyx.

If it's a parent-specific support class it can just go with the parent source, right?

Hey, that's a good idea!

Not if you go the Parent-specific CachedAttribute route. That would be a bad name, by the way. Perhaps ParentAttributeBacker is better.

ParentNonDataDescriptor?

simon-king-jena commented 11 years ago
comment:58

I think I will be able to provide a third patch with the non data descriptor tonight.

simon-king-jena commented 11 years ago
comment:59

Replying to @simon-king-jena:

I think I will be able to provide a third patch with the non data descriptor tonight.

That said: Could we perhaps make this depend on #11935?

nbruin commented 11 years ago
comment:60

Replying to @simon-king-jena:

That said: Could we perhaps make this depend on #11935?

You're doing the work, so you decide. Looking at the tickets, it seems to me this one is less invasive and hence easier to get in: It's already passing doctests; the only thing left is some backstop code that currently won't be exercised anyway.

simon-king-jena commented 11 years ago
comment:61

Replying to @nbruin:

Replying to @simon-king-jena:

That said: Could we perhaps make this depend on #11935?

You're doing the work, so you decide. Looking at the tickets, it seems to me this one is less invasive and hence easier to get in: It's already passing doctests; the only thing left is some backstop code that currently won't be exercised anyway.

Yes. And in addition, when I tried to apply a dependency of #11935, I got several errors. So, let's first fix this, and then care about #12876 and finally #11935.

nthiery commented 11 years ago
comment:62

For the record: #12876 and #11935 are applying smoothly on 5.10 beta4 and are passing all tests. And have been in needs review for one year and rebased a couple times. And much depend on them.

So I don't really care about the order, but it would be nice to have them in shortly. And I am bit lazy to rebase them once again.

simon-king-jena commented 11 years ago
comment:63

Replying to @nthiery:

For the record: #12876 and #11935 are applying smoothly on 5.10 beta4 and are passing all tests.

OK, if that's the case then these go in first, because

  1. they add features, while this is just about potential speed gain in future
  2. they are more likely to break if we wait any longer.
nthiery commented 11 years ago
comment:64

Replying to @simon-king-jena:

Replying to @nthiery:

For the record: #12876 and #11935 are applying smoothly on 5.10 beta4 and are passing all tests.

OK, if that's the case then these go in first, because

  1. they add features, while this is just about potential speed gain in future
  2. they are more likely to break if we wait any longer.

Thanks Simon, I appreciate that; I know the work it costs you!

nbruin commented 11 years ago
comment:65

Replying to @nbruin:

Replying to @simon-king-jena:

And I guess you could get it still a bit faster, if you do def __get__(self, Parent instance, cls), because our homsets are parents, and because the cdef public attribute __cached_methods could be accessed more quickly if the instance is known to be a Parent.

Excellent idea! if __cached_method isn't in a slot then we wouldn't want to use it anyway. It does mean that we lose flexibility: If for some reason, we have a class that can't inherit directly from Parent but does want to cooperate in this protocol, it won't be able to.

Note that _element_class on Parent is a @lazy_attribute. That code could benefit from a Parent instance too, so perhaps you DO want to use an (adapted form of) @lazy_attribute for both. In the case of _domain and _codomain, the wrapped function would be irrelevant.

simon-king-jena commented 9 years ago
comment:70

TODO: Provide a branch...

simon-king-jena commented 9 years ago
comment:71

Replying to @simon-king-jena:

sage.modular.abvar.homspace.EndomorphimSubring inherits from sage.categories.homset.Homset and from sage.rings.ring.Ring. Making Homset cdef with cdef attributes results in incompatible base classes.

  • One could think of creating a new cdef class sage.categories.homset.EndomorphismRing inheriting from sage.rings.ring.Ring and copying the methods implemented in sage.categories.homset.Homset, and use it in sage.modular.abvar.homspace.
    • However, that's code duplication, and I doubt that we want that.
  • One could think of dropping sage.rings.ring.Ring and let EndomorphismSubring entirely rely on the ring functionality that is provided by the category framework.

Note that this should now be a lot easier, by #18756.

  • I tried, but this means opening a can of worms. One would then also need to implement the new coercion model in sage.modular. But replacing EndomorphismSubring.__call__ by an _element_constructor_ does not work, because Homset has a __call__ method, too. Hence, we would then also need to use the new coercion model for sage.categories.homset. Yes, that's desirable. But it is not available in short time.

See #14279, but both this ticket and #14279 need work.

Short term improvements suggested in my patch

  • Accessing a Python attribute _domain defined for a Homset can be surprisingly slow. It is fast, if one has an instance of Homset itself. But it is slow if one considers an instance of a sub-class, such as EndomorphismSubring. It turns out that calling a cached method is faster than accessing the attribute! Hence, I suggest using cached method decorators on the domain() and codomain() methods (these methods are internally used in EndomorphismSubring anyway).

If I recall correctly, that's not the case any longer. However, if we are able to avoid double inheritance from Homset and Ring, there is not obstruction for making _domain and _codomain cdef.

simon-king-jena commented 9 years ago

Changed dependencies from #14159, #12951, #13184 to #14279

simon-king-jena commented 9 years ago

Changed work issues from How to store domain and codomain? to none

simon-king-jena commented 9 years ago
comment:73

Since the patches have bit-rotted for so long, I guess it makes sense to start from scratch. Most part of the discussion here was about different ways to store the _domain and _codomain. Since it should now be possible to have fast ring-behaviour defined via category framework, I suppose one can simply make them cdef attributes, which should be the fastest solution.

simon-king-jena commented 9 years ago

Changed dependencies from #14279 to #14279 #18756

simon-king-jena commented 9 years ago

Changed dependencies from #14279 #18756 to #14279 #18758

simon-king-jena commented 9 years ago
comment:75

I confused two tickets. In fact, it is #18758 which makes it possible to define a ring structure via category framework without too much slow-down.