sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.39k stars 472 forks source link

Refactor Components into parent & element #30307

Open mkoeppe opened 4 years ago

mkoeppe commented 4 years ago

Currently every Components object has lots of metadata attributes in addition to the actual data dictionary in ._comp.

If one has many different Components objects with the same symmetry metadata, we can reduce storage space as follows.

We create new classes CompParent, CompParentWithSym, ..., which store the symmetry attributes and become UniqueRepresentation. We make Components objects elements of these parents.

The parents will also have index iterator methods.

Data associated with symmetries and dimensions, computed currently each time for each CompWithSym object in methods such as __init__, __add__, trace, ... can be precomputed and cached in the parent using @cached_method in the parent class).

This will make the code in #30229 (subspaces of tensor with symmetries) more elegant because it no longer needs a dummy Components object to represent the symmetry but rather a CompParent object.

See also:

Depends on #31904 Depends on #29619

CC: @mjungmath @egourgoulhon @tscrim @honglizhaobob

Component: linear algebra

Author: Michael Jung, Matthias Koeppe

Branch/Commit: public/30307 @ e5a81f2

Issue created by migration from https://trac.sagemath.org/ticket/30307

mkoeppe commented 3 years ago
comment:2

Help with implementing this would be very welcome!

mjungmath commented 3 years ago
comment:3

That's a nice idea, +1!

Data associated with symmetries, computed currently each time for each CompWithSym object in methods such as __init__, __add__, trace, ... can then be precomputed and cached in the parent (for example using @cached_method in the parent class).

I think it would be a good idea to cache results from trace, contract, ... as well.

The most annoying thing with this ticket will most likely be the docstring that has to be adapted...is there a good way to use search&replace?

mjungmath commented 3 years ago
comment:4

Currently, the index generators and manipulators take (mostly) lists and can therefore not be cached. To use the caching most efficiently, I suggest we switch entirely to tuples.

mjungmath commented 3 years ago

Branch: public/30307

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Commit: 943f9ea

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

fec4b06fix sage -b after 30622
dff846cgo through make/build/install
943f9eaTrac #30307: CompParent and CompParentWithSym
mjungmath commented 3 years ago
comment:7

Before I go on with this ticket, could you please take a look whether this meets your rough idea?

Is anyone willing to work on the docstrings...? Perhaps there's an efficient way to do this?

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 943f9ea to a2514b6

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

a2514b6Trac #30307: no symmetries -> CompParent without symmetries
mkoeppe commented 3 years ago
comment:9

Replying to @mjungmath:

Before I go on with this ticket, could you please take a look whether this meets your rough idea?

Yes, this is going in the direction that I had in mind.

Some more of the normalization happening in CompParentWithSym.__init__ should probably be moved to __classcall_private__

mjungmath commented 3 years ago
comment:10

What would be a proper category for the parent?

mjungmath commented 3 years ago
comment:11

Replying to @mkoeppe:

Some more of the normalization happening in CompParentWithSym.__init__ should probably be moved to __classcall_private__

For example?

mkoeppe commented 3 years ago
comment:12

For example the order of the component indices does not matter

egourgoulhon commented 3 years ago
comment:13

Replying to @mjungmath:

What would be a proper category for the parent?

Take the following with a grain of salt (this is just a rough/naive thought): It seems to me that the current proposal is a kind of abuse of Sage's parent/element scheme, for CompParent does not correspond to a "genuine" mathematical object (hence maybe your question...). In particular, it does not know about the ring on which the components are based. I do not deny that a reorganization of Components would be a good thing, but maybe at the class level, not at the parent/element level, the issue here being mostly the storage of metadata.

mkoeppe commented 3 years ago
comment:14

Just use the category of sets. I don't think this is an abuse.

Think of an instance of CompParent as the set of all possible Components instances that have the specified symmetry.

By using the parent/element framework, you will get coercion for free - so elements with a coarser symmetry will be in a finer parent.

mjungmath commented 3 years ago
comment:15

Replying to @mkoeppe:

For example the order of the component indices does not matter

Right!

Replying to @mkoeppe:

Just use the category of sets. I don't think this is an abuse.

Think of an instance of CompParent as the set of all possible Components instances that have the specified symmetry.

By using the parent/element framework, you will get coercion for free - so elements with a coarser symmetry will be in a finer parent.

For me, it doesn't feel like an abuse either. Besides, I am sure you can make that notion of compontents mathematically rigorous. But I am not sure whether this becomes a well-defined set then... What about the category of objects?

mjungmath commented 3 years ago
comment:16

Alright, I make progress here. I changed only the backend such that most doctests should still pass, and the module can be used exactly as before. The elementary examples already passed.

I'll push my branch tomorrow.

I am looking forward to some benchmarks as soon as this branch is ready. :)

tscrim commented 3 years ago
comment:17

I can understand why this might seem like an abuse because it is a set of objects, but by extension, all of the combinatorial objects would be an abuse as well. So even though we will not be putting an extra mathematical structure on the components, this is still a valid use of the framework because there is a set of objects (the parent) and the individual objects (the elements).

Something to consider, however, is a potential speed penalty for the extra levels of initialization. This can be somewhat mitigated by using Cython, but performance regression testing is warranted here.

mjungmath commented 3 years ago
comment:18

Replying to @tscrim:

This can be somewhat mitigated by using Cython, but performance regression testing is warranted here.

How so?

tscrim commented 3 years ago
comment:19

Replying to @mjungmath:

Replying to @tscrim:

This can be somewhat mitigated by using Cython, but performance regression testing is warranted here.

How so?

Because Cython is faster than Python, including potential benefits from direct C-level function calls.

mjungmath commented 3 years ago
comment:20

Yes, that's clear. I meant, how to implement? Make CompParent and Components both simply extension types?

Would cdpefing the symmetry related functions cause a speedup btw?

Since this involves only integers, i.e. struct types, one could even try to Cythonize these completely.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from a2514b6 to fc22d5d

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

fc22d5dTrac 30307: Cythonize, restructuring, factory method to resemble old behavior
egourgoulhon commented 3 years ago
comment:22

Replying to @tscrim:

I can understand why this might seem like an abuse because it is a set of objects, but by extension, all of the combinatorial objects would be an abuse as well. So even though we will not be putting an extra mathematical structure on the components, this is still a valid use of the framework because there is a set of objects (the parent) and the individual objects (the elements).

Thank you Matthias, Michael and Travis for your replies. I'm convinced now that parent/element is a good approach here (I was bothered by the lack of algebraic structure of the parent), especially for symmetry-based coercions.

Something to consider, however, is a potential speed penalty for the extra levels of initialization. This can be somewhat mitigated by using Cython, but performance regression testing is warranted here.

Certainly!


New commits:

fc22d5dTrac 30307: Cythonize, restructuring, factory method to resemble old behavior
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from fc22d5d to 27bc9b1

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

27bc9b1Trac #30307: pdx file + __init__.py removed
egourgoulhon commented 3 years ago
comment:24

A possibly relevant ticket, about enhancing symmetries of components beyond the currently implemented ones: #28813


New commits:

27bc9b1Trac #30307: pdx file + __init__.py removed
mjungmath commented 3 years ago
comment:25

I am sorry for the mess. However, this is my rough approach. I separated elements and parents, and established factory methods to recover the old behavior (backwards compatibility and less work on the docstrings).

I am open to suggestions how to separate the symmetry code apart. Some symmetry treatments are coupled to the element for optimization (e.g. antisymmetrize to determine zero more quickly).

This code is too big for me to do it alone. I'd appreciate some help here. I hope my first approach is of some use for you.

egourgoulhon commented 3 years ago
comment:26

A concern about cythonizing the whole thing: the Python debugger cannot be used in Cython parts of codes, which is a pity, IMHO. So if Cython does not bring any significant performance improvement, we are loosing more than we gain.

egourgoulhon commented 3 years ago
comment:27

Also shouldn't one perform a single task in this ticket, namely refactor Components into parent/element, and leave cythonization to another ticket?

mjungmath commented 3 years ago
comment:28

Replying to @egourgoulhon:

Also shouldn't one perform a single task in this ticket, namely refactor Components into parent/element, and leave cythonization to another ticket?

That makes sense. In the next step one could try to implement #28813, and perhaps even use Cython to do so.

Besides, I just learned that one could increase the init speed with Python 3 using __slots__. Maybe that's what we should do for now.

Still, I don't know how we should separate the symmetry code properly. My idea so far was to let the parent do the symmetrizing work and return the parent with the corresponding symmetries, and construct an element from it, then assign the components. But this doesn't work so well when the element itself such as in antisymmetrize is taken into account.

Alternatively, it would be nice do define maps (morphisms) between parents doing that work and cache those morphisms. But that is something I don't know how to do in a nice and proper way. For example, if one antisymmetrizes in the variables where the components are already symmetric, it's simply the "zero morphism".

mjungmath commented 3 years ago
comment:29

Nevertheless, I think that the rough refactoring into comp_element and comp_parent (and comp to recover the old behavior) is already a good way to go.

mkoeppe commented 3 years ago
comment:30

Replying to @egourgoulhon:

A possibly relevant ticket, about enhancing symmetries of components beyond the currently implemented ones: #28813

I agree that it would be a good idea to keep possible generalizations in mind while doing this refactoring. (Also #30276)

mjungmath commented 3 years ago
comment:31

Definitely! So, do we have a roadmap?

mkoeppe commented 3 years ago
comment:32

This is looking nice already, and I agree that this ticket should do a Python implementation only. I would hope that there are already performance gains by caching.

mjungmath commented 3 years ago
comment:33

@mkoeppe: What do you say about the morphism idea in comment:27?

@tscrim: What about using __slots__ in this ticket instead of Cythonizing?

mkoeppe commented 3 years ago
comment:34

Regrading the morphisms: These are exactly the reduce, retract, lift maps that we discussed in #30229 - just on the level of components rather than modules.

mjungmath commented 3 years ago
comment:35

Replying to @mkoeppe:

Regrading the morphisms: These are exactly the reduce, retract, lift maps that we discussed in #30229 - just on the level of components rather than modules.

Mh...but I suggest more than 3 morphisms. I don't know what you mean, sorry.

Can I take it as a "yes, what a wonderful idea!"? :P

mkoeppe commented 3 years ago
comment:36

Yes, there are many maps:

tscrim commented 3 years ago
comment:37

I don't expect __slots__ to provide much benefit to speed as from what I am reading, it is more useful for memory usage. The main thing is to turn the file into a .pyx file with a .pxd header for declarations. The element class should be a cpdef with the main parameters declared and given explicit typing. These are relatively easy things to do provided you don't have multiple inheritance in the element classes (the parent can remain normal Python classes in a pyx file).

IMO, you also don't loose too much with converting to Cython with debugging because you still get a lot with error tracebacks. The biggest thing I have found lost is the ease of profiling to find bottlenecks. However, testing with practical examples can get around that a bit, and there is a tool to profile Cython code IIRC. So I generally see this as a smaller trade-off.

mjungmath commented 3 years ago
comment:38

Well, __slots__ makes initializations faster because attribute access is significantly faster (up to 30%). Memory is just an additional benefit.

Even if we turn Components into a cpdef, I think we still gain a good performance boost with __slots__ which can sum up quickly with recurrent use of _comp.

mjungmath commented 3 years ago
comment:39

To use morphisms, we need a new category (say Category of collections of abstract components) and a homset, right?

mjungmath commented 3 years ago
comment:40

Btw, I am not convinced that cpdefing the class makes initializations faster, I'd bet on the contrary. When I understand it correctly, cpdef creates two classes in the background, an extension type and a usual Python class. Since we don't expect a benefit during the lifetime of an instance, it should even slowdown things. But maybe I just got things wrong here.

tscrim commented 3 years ago
comment:42

What are you talking about cpdefing a class? You only do that to functions AFAIK. I am still not 100% convinced by the __slots__ approach, more so because it makes it (slightly) harder to switch from a Python class to an extension class IIRC.

I don't see why we need a new category. We don't need to be so heavy-handed with the approach. Having the homset be in the category of (enumerated?) sets is sufficient IMO.

tscrim commented 3 years ago
comment:43

The main benefit with Cython and extension classes is being able to strongly type things to have as many C level function calls and code as possible.

mjungmath commented 3 years ago
comment:44

Replying to @tscrim:

I don't see why we need a new category. We don't need to be so heavy-handed with the approach. Having the homset be in the category of (enumerated?) sets is sufficient IMO.

Mathematically, I am still not convinced that our parents constitute sets. Their elements run over all possible rings (and "frames"), which is not a set either.

mjungmath commented 3 years ago
comment:45

Replying to @tscrim:

The main benefit with Cython and extension classes is being able to strongly type things to have as many C level function calls and code as possible.

Indeed. That needs a thorough modification of the current code. The indices checks use mostly lists and Python sets. The components are stored as dictionaries which cannot be strongly typed. I agree, that should eventually be done, however I propose that's something for another ticket.

mjungmath commented 3 years ago
comment:46

For now, we can use __slots__ for a slight speedup and remove it again when we Cythonize. That's what I meant.

tscrim commented 3 years ago
comment:47

I don't really like this partial measure. I wouldn't do it and just keep it pure Python until you actually decide to make it Cython.

Why is the ring associated with the element and not the parent? In general, why is all of these attributes copied from the parent? Is the extra level of indirection that slow? You can just call self._parent in the Cython code.

Because of how your current implementation is done, you are not going to benefit from coercions. Nor do you seem to be using anything in a category. Thus, I would actually weaken things and not use Sage's Parent/Element classes. Instead, I would just mimic them. Perhaps I am misunderstanding how you plan to apply these.

There is still a lot of you things you can do to tell Cython to make things be strongly typed even if they are extracted from lists/sets/dicts, such as type-casting.

mjungmath commented 3 years ago
comment:48

Please don't look at the details yet. The code is far away from being finished. Coercions come, attributes will be removed (it's just to make parts of the code already debuggable).

Why has the parent no ring? Because the symmetries and indices simply do not depend on it. On the level of indices and symmetries we have: different ring -> same data -> same instance -> more effective storage.