sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.43k stars 478 forks source link

Improved use of category framework for IntegerModRing #15229

Closed simon-king-jena closed 9 years ago

simon-king-jena commented 11 years ago

On sage-devel, some discussion on IntegerModRing and its relation to categories took place. I would summarize the discussion and its consequences as follows:

By #11900, refinement of category happens when it is tested whether the IntegerModRing is a field by R in Fields(). However, there is some inconsistency:

sage: S = IntegerModRing(5)
sage: S.category()
Join of Category of commutative rings and Category of subquotients of monoids and Category of quotients of semigroups and Category of finite enumerated sets
sage: S in Fields()
True
sage: S.category()
Join of Category of fields and Category of subquotients of monoids and Category of quotients of semigroups and Category of finite enumerated sets
sage: R = IntegerModRing(5, Fields())
sage: R.category()
Join of Category of fields and Category of subquotients of monoids and Category of quotients of semigroups

I think we would want that R and S are in the same category after the category refinement of S.

So, the first aim of this ticket is to make the categories consistent.

Secondly, comparison of IntegerModRing includes comparison of types. Originally, this has been introduced since one wanted GF(p) and IntegerModRing(p) evaluate differently. However, changing the category changes the type, and thus changes the ==-equivalence class. I think this must not happen. Hence, the second aim of this ticket is to make == independent of the category, while still letting GF(p) and IntegerModRing(p, category=Fields()) evaluate unequal.

Thirdly, in the discussion on sage-devel, it was suggested to refine the category of R when R.is_field() returns true. Currently, refinement only happens when R in Fields() returns true. So, the third aim of this ticket to do the refinement as suggested.

Cc to John Cremona, since it was he who brought the "category refinement in is_field()" topic up...

CC: @JohnCremona @nthiery @jpflori @novoselt

Component: categories

Keywords: sd53 category integermodring

Author: Simon King

Branch/Commit: 048aec7

Reviewer: Jean-Pierre Flori

Issue created by migration from https://trac.sagemath.org/ticket/15229

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 11 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

[changeset:2a08a44]Make pickling of ZZ/nZZ aware of being in Fields()
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 11 years ago

Changed commit from 6ada7e0 to 2a08a44

simon-king-jena commented 11 years ago

Author: Simon King

simon-king-jena commented 11 years ago
comment:43

With the current commit, one can do

sage: R = IntegerModRing(7)
sage: save(R, 'tmpR')
sage: R.category().is_subcategory(Fields())
False
sage: R in Fields()
True
sage: R.category().is_subcategory(Fields())
True
sage: save(R, 'tmpRcat')

and then in a new Sage session

sage: R = load('tmpR.sobj')
sage: R.category().is_subcategory(Fields())
False
sage: F = load('tmpRcat.sobj')
sage: F is R
True
sage: R.category().is_subcategory(Fields())
True

or, in another new Sage session

sage: F = load('tmpRcat.sobj')
sage: F.category().is_subcategory(Fields())
True
sage: R = load('tmpR.sobj')
sage: R is F
True
sage: R.category().is_subcategory(Fields())
True

So, we still have a globally unique integer mod ring, even after unpickling a ring and a field version of the same integer mod ring. And the information found in the pickle will be used to update the unique integer mod ring.

The cached data are not preserved in the pickle, but I hope we can do without it. I make it "needs review" now.

nbruin commented 11 years ago
comment:44

Replying to @simon-king-jena:

Secondly, what happens if we have created a ring R in a sage session, and R.is_field already has cached values. If we then load a ring with the same modulus from a file, then (by uniqueness) R is returned---but R.__dict__ is updated with what we have found on the disc: The cached values would still be available to R.is_field, but the values would not be part of R.__dict__ any longer.

What is the problem here? __setstate__ by default should update self.__dict__ with the pickled dictionary, not replace it. The python documentation makes quite clear that this is what is done in the default situation (i.e., when there's no __setstate__ at all). We should definitely follow that almost everywhere, unless we have very good reasons not to, which should be documented.

So if the pickled ring had no cached value for is_field, the value doesn't get clobbered. If the existing and the pickled version both have cached values stored, but they are conflicting then it's a mess anyway.

nbruin commented 11 years ago
comment:45

I think ParentWithGens already mostly does the right thing. It's just a bit inflexible to either expect that all these required attributes must be filled from the dict or have no dict at all. I'm also doubtful that we should leave these entries in the dict. These are slot attributes, right, so it's useless:

sage: P=QQ['x']
sage: P.__dict__['base']=0
sage: P._base
Rational Field

so it would be cleaner to remove them anyway, e.g.

    def __setstate__(self, d):
        if d.has_key('_element_constructor'):
            return parent.Parent.__setstate__(self, d)
+       t=d.pop('_base',None); if t is not None: self._base = t
+       t=d.pop('_gens',None); if t is not None: self._gens = t
+       t=d.pop('_gens_dict',None); if t is not None: self._gens_dict = t
+       t=d.pop('_list',None); if t is not None: self._list = t
+       t=d.pop('_latex_names',None); if t is not None: self._latex_names = t
+       #if this attribute doesn't exist it has no business in the pickle either.
+       t=d.pop('_generator_orders',None); if t is not None: self._generator_orders = t
        try:
            self.__dict__.update(d)
-           self._generator_orders = d['_generator_orders']
        except (AttributeError,KeyError):
+           for key,value in d.iteritems():
+               setattr(self,key,value)
-       self._base = d['_base']        
-       self._gens = d['_gens']
-       self._gens_dict = d['_gens_dict']
-       self._list = d['_list']
-       self._names = d['_names']
-       self._latex_names = d['_latex_names']

Incidentally, if an InterModRing were to be part of a cycle in which it's important to know that it's a field, setstate may be too late to avoid triggering the primality test. So perhaps you should just pickle the is_field as part of the construction parameters if it's true. A {'is_field':True} kwargs won't create cycles.

There's of course the issue that unpickling the dictionary is almost certainly more expensive than doing the primality test for smallish moduli. In fact, if you look at what

simon-king-jena commented 11 years ago
comment:46

Replying to @nbruin:

Replying to @simon-king-jena: What is the problem here? __setstate__ by default should update self.__dict__ with the pickled dictionary, not replace it.

sage: R = IntegerModRing(7)
sage: R.is_field(proof=False)
True
sage: R.__dict__
{'_IntegerModRing_generic__factored_order': None,
 '_IntegerModRing_generic__factored_unit_order': None,
 '_IntegerModRing_generic__order': 7,
 '_IntegerModRing_generic__unit_group_exponent': None,
 '_QuotientRing_nc__I': Principal ideal (7) of Integer Ring,
 '_QuotientRing_nc__R': Integer Ring,
 '__reduce_ex__': <bound method ?.generic_factory_reduce of Ring of integers modulo 7>,
 '_cache__is_field': {((False,), ()): True},
 '_pyx_order': <sage.rings.finite_rings.integer_mod.NativeIntStruct at 0xbba9c8c>,
 'is_field': Cached version of <function is_field at 0x926dd4c>}
sage: R.is_field.cache is R._cache__is_field
True

Now we update the __dict__ by something that would come up when unpickling an old instance of the same ring that had called R.is_field() instead of R.is_field(proof=False). The effect would be:

sage: R.__dict__.update([('_cache__is_field', {((None,), ()):True})])

and, as I have promised, the cache of is_field is detached from the dictionary that is in __dict__:

sage: R.is_field.cache is R._cache__is_field
False

Hence, if we now do

sage: R.is_field(proof=True)
True

(which might be a particular expensive computation, as we do proof=True), and if we then pickle R, then the pickle would not store the result of the computations we had with proof=False and proof=True, but only the one without providing the proof argument. Reason:

sage: R._cache__is_field
{((None,), ()): True}

Do you see the problem now?

So if the pickled ring had no cached value for is_field, the value doesn't get clobbered.

Correct.

If the existing and the pickled version both have cached values stored, but they are conflicting then it's a mess anyway.

No, they are not conflicting in my example above. They simply are caches for different values of an optional argument.

simon-king-jena commented 11 years ago
comment:47

Replying to @nbruin:

These are slot attributes, right, so ... it would be cleaner to remove them anyway [in __setstate__]

No, that's wrong. The purpose of this __setstate__ is to fill the slots from a given dictionary. And I am sure that this is used somewhere in Sage, and thus removing it would break things.

Incidentally, if an InterModRing were to be part of a cycle in which it's important to know that it's a field, setstate may be too late to avoid triggering the primality test. So perhaps you should just pickle the is_field as part of the construction parameters if it's true.

This is what the current branch does. If the category gets updated either by providing the new optional argument is_field=True in the IntegerModRing factory, or by calling R.is_field() (which will also be called if you do R in Fields()), then the construction parameters used for pickling are updated.

In other words, my current branch does not use __setstate__.

simon-king-jena commented 11 years ago
comment:48

Replying to @simon-king-jena:

In other words, my current branch does not use __setstate__.

Then you may ask: "Why did he start mentioning __setstate__?"

Well, easy. If the new optional argument is_field=True is provided to the IntegerModRing factory (e.g., by unpickling), then both R.is_field() and R.is_field(proof=False) will exploit this information, i.e., they will avoid the primality test. But R.is_field(proof=True) will do the primality test even if IntegerModRing(..., is_field=True) had been done.

Hence: Caching R.is_field() and R.is_field(proof=False) is not critical, because the "semantics" of the result is stored in the pickle via is_field=True. However, caching R.is_field(proof=True) is critical, because is_field=True is not good enough for avoiding the primality test.

This is why I wondered how one can pickle the cache of R.is_field, and I have shown that this is not easy.

nbruin commented 11 years ago
comment:49

Replying to @simon-king-jena:

Replying to @nbruin:

These are slot attributes, right, so ... it would be cleaner to remove them anyway [in __setstate__]

No, that's wrong. The purpose of this __setstate__ is to fill the slots from a given dictionary. And I am sure that this is used somewhere in Sage, and thus removing it would break things.

I agree with the latter bit, but I don't see what's wrong then. We're probably misunderstanding. The code I proposed does set attributes Do you mean we should copy d before we pop from it?.

In principle, __setstate__ should just do with the dictionary:

for key,value in d:
    setattr(self,key,value)

If we do that, we don't need any extra machinery. However, the observation is that for attributes that live in self.__dict__, an update is faster. However, slot attributes that for pickling reasons get put into d should not afterwards still be added to __dict__. Access to them is shadowed by slots anyway. So, by the time you do self.__dict__.update(d) you should make sure that attributes that are stored in slots have been scrubbed from d. I don't think it's usually a disaster to also have them in self.__dict__, but it's pointless and messy and it's not what the object looked like before pickling.

Thanks for explaining the caching problem. What we'd need is a merge of of the cache dictionary and that's hard to arrange. This problem will arise with all cachedmethods on unique objects. So regardless of cyclicity, pickling (unique) objects with cached methods is a bit broken anyway: restoring a pickle can delete information on an existing object. This only occurs for methods that take arguments, but unfortunately it includes optional arguments such as proof=.

While writing @cachedmethod for structural data seems like a convenient trick, I think we're increasingly finding that its implementation causes trouble that would likely not arise from manually implemented caching strategies (just test and assign an attribute). Perhaps we should stop advocating "you should just use @cachedmethod".

CachedMethodNoArgs is fine, but there a lazy_attribute is usually a tiny bit more efficient.


Come to think of it, the way cachedmethod stores its cache makes the problem more readily apparent, but the problem is more general.

The problem we're running into is basically the usual "merge conflict resolution". Our unique objects are supposed to be immutable, but they still have some stuff on them that can change/be refined during the lifetime. One would assume the "refinements" are ordered (perhaps form a lattice?), so given two different states of refinement there should be a common state that is a refinement of either, and that would be the appropriate state to be in after merging. Can we think of a low-cost framework to encode such info? If the "refining" state consists of a dict that grows more more entries, then the appropriate merge is indeed to merge the dictionaries (the fact that it's refinement should mean that if keys occur in both, then the values should agree)

In cases where the variable state is not ordered by "refinement" I doubt we can do anything, but in those cases, I suspect that our object really cannot be considered "immutable" and hence the thing shouldn't be unique.

So one way would be to have a list of attributes that are to be "merged" rather than be overwritten in setstate. Then the instantiation of @cachedmethod could include registering the relevant attribute for the setstate.

This could be similarly implemented to the "initialize upon construction" attributes in the RichPickle draft on #15207.

I think such a protocol should be implemented on CachedRepresentation since that introduces the trouble. That would be the place to try and resolve the ensuing pickling problems.

It also means that classes that inherit from CachedRepresentation and implement a __setstate__ should either follow the protocol themselves or comply by calling the setstate of CachedRepresentation in an appropriate manner.

nthiery commented 11 years ago
comment:50

Hi!

Lots of discussion here :-)

The current consensus makes full sense to me up to one detail. My personal preference would be to keep using the "category" argument to specify that the parent is a field.

I am using this idiom all the time to e.g. specify that my monoids are finite/commutative/J-Trivial/... And also, sometimes, to add actual structure to a parent by implementing it in the category that I pass to it. That might be a bit borderline, but it's super practical :-)

In a perfect world, where we would have a clean support for full subcategories, it would be nice to have, for any parent class P, the property that P(..., category=As()) and P(..., category=Bs()) are identical if and only if As() and Bs() are two full subcategories of the same category. Of course, axioms can help in this direction, but we probably don't have a complete enough infrastructure to implement this generically.

But maybe, as a step toward this, we can at this point implement this behavior by hand in the special cases that deserve it like IntegerModRing?

In any cases, keep up the good work!

Cheers, Nicolas

simon-king-jena commented 11 years ago
comment:51

Replying to @nthiery:

The current consensus makes full sense to me up to one detail. My personal preference would be to keep using the "category" argument to specify that the parent is a field.

So, would you think we should make it so that IntegerModRing understands both is_field=True and category=Fields(), namely with IntegerModRing(p, is_field=True) is IntegerModRing(p, category=Fields())?

BTW, I don't understand the errors that the patchbot is reporting. They seem unrelated...

nthiery commented 11 years ago
comment:52

I for myself would tend to just support the category=Fields() syntax.

Imagine a case where the parent could have five different axioms depening on the input; then you'd have to support five arguments when only one does the trick uniformly. That's not a completely insane thought: e.g. for non-commutative groebner basis your could have as axioms: finite dimensional, finite, commutative, nozerodivisors, ... ).

nbruin commented 11 years ago
comment:53

Replying to @nthiery:

I for myself would tend to just support the category=Fields() syntax.

Imagine a case where the parent could have five different axioms depening on the input; then you'd have to support five arguments when only one does the trick uniformly. That's not a completely insane thought: e.g. for non-commutative groebner basis your could have as axioms: finite dimensional, finite, commutative, nozerodivisors, ... ).

What solution do you have in mind to model the equivalence classes of category arguments that need to be considered for "global unique instance" lookup?

Also, how do you determine which inputs for "category" are valid? Isn't that just going to be a list of "allowed" values? In that case, the equivalence classes are easy: you just group the "allowed" list. Note that we've found above that these category equivalence classes need to be closed under refinements that could possibly occur.

simon-king-jena commented 10 years ago
comment:54

Oops, I just see that I lost track of this one. Pulling the branch now...

simon-king-jena commented 10 years ago
comment:55

Replying to @nthiery:

I for myself would tend to just support the category=Fields() syntax.

Imagine a case where the parent could have five different axioms depening on the input; then you'd have to support five arguments when only one does the trick uniformly. That's not a completely insane thought: e.g. for non-commutative groebner basis your could have as axioms: finite dimensional, finite, commutative, nozerodivisors, ... ).

I think the above discussion gives good arguments why the factory for integer mod rings should not allow general freedom in the choice of a category: The factory needs to ensure uniqueness of instances. Some choices of a category play well with uniqueness: You want full sub-categories, and you want that containment in the sub-category does not rely on additional data (a ring intrinsically either is or is not a field). But other categories should result in a non-equal copy of the ring; for example, a Hopf algebra structure is not an intrinsic property and should thus not use a globally unique copy of a ring.

Hence, the constructor of IntegerModRing_generic (or what's the name of the class?) should (and already does) of course accept a category argument. But here, we talk about the factory that manages quotients of the integer ring.

I say: The factory should make it possible that the user asserts that the resulting ring is in fact a field, to avoid expensive primality tests. However, if you want to prescribe any fancy category for the resulting ring, you should either refine the category after the ring has been returned by the factory, or you should create a new factory (for quotient rings providing a fancy Hopf algebra structure) that is orthogonal to the IntegerModRing factory.

In any case: Are there open questions from the discussion? AFAIK, the current branch addresses all questions. What is missing for a review?

simon-king-jena commented 10 years ago
comment:56

All tests pass, but I guess I should remove some trailing whitespace.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from 2a08a44 to fae3f07

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

fae3f07Trac 15229: Complete a phrase in the doc, remove trailing whitespace
7b61761Merge branch 'master' into ticket/15229
simon-king-jena commented 10 years ago
comment:58

Oops, I didn't know that I had merged (an old version of) master into the branch. Meanwhile I have learnt that one shouldn't do such merges. In any case: There was some trailing whitespace, and there was a phrase that ended somewhere in the middle and needed completion.

And this ticket needs review!

rwst commented 10 years ago

Work Issues: rebase

tscrim commented 9 years ago

Changed commit from fae3f07 to b012bf3

tscrim commented 9 years ago

Changed branch from u/SimonKing/ticket/15229 to public/categories/improve_integer_mod_ring-15229

tscrim commented 9 years ago
comment:63

This solves an issue noted in #15475 (and #11979). I've rebased it and (non-long) doctests in the file pass, what needs review to finalize this?


New commits:

77f733dMerge branch 'u/SimonKing/ticket/15229' of trac.sagemath.org:sage into public/categories/improve_integer_mod_ring-15229
b012bf3Some tweaks to get doctests passing from the merge.
tscrim commented 9 years ago

Changed work issues from rebase to none

simon-king-jena commented 9 years ago
comment:65

Replying to @tscrim:

This solves an issue noted in #15475 (and #11979). I've rebased it and (non-long) doctests in the file pass, what needs review to finalize this?

Sorry, no idea. It has been too long since I last had a look at the patch.

jpflori commented 9 years ago
comment:66

I had a look and am mostly happy with this ticket, though I might still prefer to let coexisted different instances of Z/nZ. Anyway I know how to hack the cache so I can live with the current solution.

Two minor typos:

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from b012bf3 to 048aec7

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

8f728f1Merge remote-tracking branch 'trac/develop' into ticket/15229
048aec7Correct two minor typos for IntegerModRing category use improvements.
jpflori commented 9 years ago

Reviewer: Jean-Pierre Flori

vbraun commented 9 years ago

Changed branch from public/categories/improve_integer_mod_ring-15229 to 048aec7