Closed simon-king-jena closed 9 years ago
Branch pushed to git repo; I updated commit sha1. New commits:
[changeset:2a08a44] | Make pickling of ZZ/nZZ aware of being in Fields() |
Author: Simon King
With the current commit, one can do
sage: R = IntegerModRing(7)
sage: save(R, 'tmpR')
sage: R.category().is_subcategory(Fields())
False
sage: R in Fields()
True
sage: R.category().is_subcategory(Fields())
True
sage: save(R, 'tmpRcat')
and then in a new Sage session
sage: R = load('tmpR.sobj')
sage: R.category().is_subcategory(Fields())
False
sage: F = load('tmpRcat.sobj')
sage: F is R
True
sage: R.category().is_subcategory(Fields())
True
or, in another new Sage session
sage: F = load('tmpRcat.sobj')
sage: F.category().is_subcategory(Fields())
True
sage: R = load('tmpR.sobj')
sage: R is F
True
sage: R.category().is_subcategory(Fields())
True
So, we still have a globally unique integer mod ring, even after unpickling a ring and a field version of the same integer mod ring. And the information found in the pickle will be used to update the unique integer mod ring.
The cached data are not preserved in the pickle, but I hope we can do without it. I make it "needs review" now.
Replying to @simon-king-jena:
Secondly, what happens if we have created a ring R in a sage session, and
R.is_field
already has cached values. If we then load a ring with the same modulus from a file, then (by uniqueness) R is returned---butR.__dict__
is updated with what we have found on the disc: The cached values would still be available toR.is_field
, but the values would not be part ofR.__dict__
any longer.
What is the problem here? __setstate__
by default should update self.__dict__
with the pickled dictionary, not replace it. The python documentation makes quite clear that this is what is done in the default situation (i.e., when there's no __setstate__
at all). We should definitely follow that almost everywhere, unless we have very good reasons not to, which should be documented.
So if the pickled ring had no cached value for is_field
, the value doesn't get clobbered. If the existing and the pickled version both have cached values stored, but they are conflicting then it's a mess anyway.
I think ParentWithGens
already mostly does the right thing. It's just a bit inflexible to either expect that all these required attributes must be filled from the dict or have no dict at all. I'm also doubtful that we should leave these entries in the dict. These are slot attributes, right, so it's useless:
sage: P=QQ['x']
sage: P.__dict__['base']=0
sage: P._base
Rational Field
so it would be cleaner to remove them anyway, e.g.
def __setstate__(self, d):
if d.has_key('_element_constructor'):
return parent.Parent.__setstate__(self, d)
+ t=d.pop('_base',None); if t is not None: self._base = t
+ t=d.pop('_gens',None); if t is not None: self._gens = t
+ t=d.pop('_gens_dict',None); if t is not None: self._gens_dict = t
+ t=d.pop('_list',None); if t is not None: self._list = t
+ t=d.pop('_latex_names',None); if t is not None: self._latex_names = t
+ #if this attribute doesn't exist it has no business in the pickle either.
+ t=d.pop('_generator_orders',None); if t is not None: self._generator_orders = t
try:
self.__dict__.update(d)
- self._generator_orders = d['_generator_orders']
except (AttributeError,KeyError):
+ for key,value in d.iteritems():
+ setattr(self,key,value)
- self._base = d['_base']
- self._gens = d['_gens']
- self._gens_dict = d['_gens_dict']
- self._list = d['_list']
- self._names = d['_names']
- self._latex_names = d['_latex_names']
Incidentally, if an InterModRing were to be part of a cycle in which it's important to know that it's a field, setstate may be too late to avoid triggering the primality test. So perhaps you should just pickle the is_field
as part of the construction parameters if it's true. A {'is_field':True}
kwargs won't create cycles.
There's of course the issue that unpickling the dictionary is almost certainly more expensive than doing the primality test for smallish moduli. In fact, if you look at what
Replying to @nbruin:
Replying to @simon-king-jena: What is the problem here?
__setstate__
by default should updateself.__dict__
with the pickled dictionary, not replace it.
sage: R = IntegerModRing(7)
sage: R.is_field(proof=False)
True
sage: R.__dict__
{'_IntegerModRing_generic__factored_order': None,
'_IntegerModRing_generic__factored_unit_order': None,
'_IntegerModRing_generic__order': 7,
'_IntegerModRing_generic__unit_group_exponent': None,
'_QuotientRing_nc__I': Principal ideal (7) of Integer Ring,
'_QuotientRing_nc__R': Integer Ring,
'__reduce_ex__': <bound method ?.generic_factory_reduce of Ring of integers modulo 7>,
'_cache__is_field': {((False,), ()): True},
'_pyx_order': <sage.rings.finite_rings.integer_mod.NativeIntStruct at 0xbba9c8c>,
'is_field': Cached version of <function is_field at 0x926dd4c>}
sage: R.is_field.cache is R._cache__is_field
True
Now we update the __dict__
by something that would come up when unpickling an old instance of the same ring that had called R.is_field()
instead of R.is_field(proof=False)
. The effect would be:
sage: R.__dict__.update([('_cache__is_field', {((None,), ()):True})])
and, as I have promised, the cache of is_field
is detached from the dictionary that is in __dict__
:
sage: R.is_field.cache is R._cache__is_field
False
Hence, if we now do
sage: R.is_field(proof=True)
True
(which might be a particular expensive computation, as we do proof=True), and if we then pickle R, then the pickle would not store the result of the computations we had with proof=False and proof=True, but only the one without providing the proof argument. Reason:
sage: R._cache__is_field
{((None,), ()): True}
Do you see the problem now?
So if the pickled ring had no cached value for
is_field
, the value doesn't get clobbered.
Correct.
If the existing and the pickled version both have cached values stored, but they are conflicting then it's a mess anyway.
No, they are not conflicting in my example above. They simply are caches for different values of an optional argument.
Replying to @nbruin:
These are slot attributes, right, so ... it would be cleaner to remove them anyway [in
__setstate__
]
No, that's wrong. The purpose of this __setstate__
is to fill the slots from a given dictionary. And I am sure that this is used somewhere in Sage, and thus removing it would break things.
Incidentally, if an InterModRing were to be part of a cycle in which it's important to know that it's a field, setstate may be too late to avoid triggering the primality test. So perhaps you should just pickle the
is_field
as part of the construction parameters if it's true.
This is what the current branch does. If the category gets updated either by providing the new optional argument is_field=True
in the IntegerModRing
factory, or by calling R.is_field()
(which will also be called if you do R in Fields()
), then the construction parameters used for pickling are updated.
In other words, my current branch does not use __setstate__
.
Replying to @simon-king-jena:
In other words, my current branch does not use
__setstate__
.
Then you may ask: "Why did he start mentioning __setstate__
?"
Well, easy. If the new optional argument is_field=True
is provided to the IntegerModRing
factory (e.g., by unpickling), then both R.is_field()
and R.is_field(proof=False)
will exploit this information, i.e., they will avoid the primality test. But R.is_field(proof=True)
will do the primality test even if IntegerModRing(..., is_field=True)
had been done.
Hence: Caching R.is_field()
and R.is_field(proof=False)
is not critical, because the "semantics" of the result is stored in the pickle via is_field=True
. However, caching R.is_field(proof=True)
is critical, because is_field=True
is not good enough for avoiding the primality test.
This is why I wondered how one can pickle the cache of R.is_field
, and I have shown that this is not easy.
Replying to @simon-king-jena:
Replying to @nbruin:
These are slot attributes, right, so ... it would be cleaner to remove them anyway [in
__setstate__
]No, that's wrong. The purpose of this
__setstate__
is to fill the slots from a given dictionary. And I am sure that this is used somewhere in Sage, and thus removing it would break things.
I agree with the latter bit, but I don't see what's wrong then. We're probably misunderstanding. The code I proposed does set attributes Do you mean we should copy d
before we pop from it?.
In principle, __setstate__
should just do with the dictionary:
for key,value in d:
setattr(self,key,value)
If we do that, we don't need any extra machinery. However, the observation is that for attributes that live in self.__dict__
, an update is faster. However, slot attributes that for pickling reasons get put into d
should not afterwards still be added to __dict__
. Access to them is shadowed by slots anyway. So, by the time you do self.__dict__.update(d)
you should make sure that attributes that are stored in slots have been scrubbed from d
. I don't think it's usually a disaster to also have them in self.__dict__
, but it's pointless and messy and it's not what the object looked like before pickling.
Thanks for explaining the caching problem. What we'd need is a merge of of the cache dictionary and that's hard to arrange. This problem will arise with all cachedmethods on unique objects. So regardless of cyclicity, pickling (unique) objects with cached methods is a bit broken anyway: restoring a pickle can delete information on an existing object. This only occurs for methods that take arguments, but unfortunately it includes optional arguments such as proof=
.
While writing @cachedmethod
for structural data seems like a convenient trick, I think we're increasingly finding that its implementation causes trouble that would likely not arise from manually implemented caching strategies (just test and assign an attribute). Perhaps we should stop advocating "you should just use @
cachedmethod".
CachedMethodNoArgs
is fine, but there a lazy_attribute
is usually a tiny bit more efficient.
Come to think of it, the way cachedmethod stores its cache makes the problem more readily apparent, but the problem is more general.
The problem we're running into is basically the usual "merge conflict resolution". Our unique objects are supposed to be immutable, but they still have some stuff on them that can change/be refined during the lifetime. One would assume the "refinements" are ordered (perhaps form a lattice?), so given two different states of refinement there should be a common state that is a refinement of either, and that would be the appropriate state to be in after merging. Can we think of a low-cost framework to encode such info? If the "refining" state consists of a dict that grows more more entries, then the appropriate merge is indeed to merge the dictionaries (the fact that it's refinement should mean that if keys occur in both, then the values should agree)
In cases where the variable state is not ordered by "refinement" I doubt we can do anything, but in those cases, I suspect that our object really cannot be considered "immutable" and hence the thing shouldn't be unique.
So one way would be to have a list of attributes that are to be "merged" rather than be overwritten in setstate. Then the instantiation of @
cachedmethod could include registering the relevant attribute for the setstate.
This could be similarly implemented to the "initialize upon construction" attributes in the RichPickle draft on #15207.
I think such a protocol should be implemented on CachedRepresentation
since that introduces the trouble. That would be the place to try and resolve the ensuing pickling problems.
It also means that classes that inherit from CachedRepresentation
and implement a __setstate__
should either follow the protocol themselves or comply by calling the setstate of CachedRepresentation in an appropriate manner.
Hi!
Lots of discussion here :-)
The current consensus makes full sense to me up to one detail. My personal preference would be to keep using the "category" argument to specify that the parent is a field.
I am using this idiom all the time to e.g. specify that my monoids are finite/commutative/J-Trivial/... And also, sometimes, to add actual structure to a parent by implementing it in the category that I pass to it. That might be a bit borderline, but it's super practical :-)
In a perfect world, where we would have a clean support for full subcategories, it would be nice to have, for any parent class P, the property that P(..., category=As()) and P(..., category=Bs()) are identical if and only if As() and Bs() are two full subcategories of the same category. Of course, axioms can help in this direction, but we probably don't have a complete enough infrastructure to implement this generically.
But maybe, as a step toward this, we can at this point implement this behavior by hand in the special cases that deserve it like IntegerModRing?
In any cases, keep up the good work!
Cheers, Nicolas
Replying to @nthiery:
The current consensus makes full sense to me up to one detail. My personal preference would be to keep using the "category" argument to specify that the parent is a field.
So, would you think we should make it so that IntegerModRing
understands both is_field=True
and category=Fields()
, namely with IntegerModRing(p, is_field=True) is IntegerModRing(p, category=Fields())
?
BTW, I don't understand the errors that the patchbot is reporting. They seem unrelated...
I for myself would tend to just support the category=Fields()
syntax.
Imagine a case where the parent could have five different axioms depening on the input; then you'd have to support five arguments when only one does the trick uniformly. That's not a completely insane thought: e.g. for non-commutative groebner basis your could have as axioms: finite dimensional, finite, commutative, nozerodivisors, ... ).
Replying to @nthiery:
I for myself would tend to just support the
category=Fields()
syntax.Imagine a case where the parent could have five different axioms depening on the input; then you'd have to support five arguments when only one does the trick uniformly. That's not a completely insane thought: e.g. for non-commutative groebner basis your could have as axioms: finite dimensional, finite, commutative, nozerodivisors, ... ).
What solution do you have in mind to model the equivalence classes of category arguments that need to be considered for "global unique instance" lookup?
Also, how do you determine which inputs for "category" are valid? Isn't that just going to be a list of "allowed" values? In that case, the equivalence classes are easy: you just group the "allowed" list. Note that we've found above that these category equivalence classes need to be closed under refinements that could possibly occur.
Oops, I just see that I lost track of this one. Pulling the branch now...
Replying to @nthiery:
I for myself would tend to just support the
category=Fields()
syntax.Imagine a case where the parent could have five different axioms depening on the input; then you'd have to support five arguments when only one does the trick uniformly. That's not a completely insane thought: e.g. for non-commutative groebner basis your could have as axioms: finite dimensional, finite, commutative, nozerodivisors, ... ).
I think the above discussion gives good arguments why the factory for integer mod rings should not allow general freedom in the choice of a category: The factory needs to ensure uniqueness of instances. Some choices of a category play well with uniqueness: You want full sub-categories, and you want that containment in the sub-category does not rely on additional data (a ring intrinsically either is or is not a field). But other categories should result in a non-equal copy of the ring; for example, a Hopf algebra structure is not an intrinsic property and should thus not use a globally unique copy of a ring.
Hence, the constructor of IntegerModRing_generic
(or what's the name of the class?) should (and already does) of course accept a category
argument. But here, we talk about the factory that manages quotients of the integer ring.
I say: The factory should make it possible that the user asserts that the resulting ring is in fact a field, to avoid expensive primality tests. However, if you want to prescribe any fancy category for the resulting ring, you should either refine the category after the ring has been returned by the factory, or you should create a new factory (for quotient rings providing a fancy Hopf algebra structure) that is orthogonal to the IntegerModRing
factory.
In any case: Are there open questions from the discussion? AFAIK, the current branch addresses all questions. What is missing for a review?
All tests pass, but I guess I should remove some trailing whitespace.
Oops, I didn't know that I had merged (an old version of) master into the branch. Meanwhile I have learnt that one shouldn't do such merges. In any case: There was some trailing whitespace, and there was a phrase that ended somewhere in the middle and needed completion.
And this ticket needs review!
Work Issues: rebase
Changed branch from u/SimonKing/ticket/15229 to public/categories/improve_integer_mod_ring-15229
This solves an issue noted in #15475 (and #11979). I've rebased it and (non-long) doctests in the file pass, what needs review to finalize this?
New commits:
77f733d | Merge branch 'u/SimonKing/ticket/15229' of trac.sagemath.org:sage into public/categories/improve_integer_mod_ring-15229 |
b012bf3 | Some tweaks to get doctests passing from the merge. |
Changed work issues from rebase to none
Replying to @tscrim:
This solves an issue noted in #15475 (and #11979). I've rebased it and (non-long) doctests in the file pass, what needs review to finalize this?
Sorry, no idea. It has been too long since I last had a look at the patch.
I had a look and am mostly happy with this ticket, though I might still prefer to let coexisted different instances of Z/nZ. Anyway I know how to hack the cache so I can live with the current solution.
Two minor typos:
an integer 1
21
is hardcoded in an error message.Reviewer: Jean-Pierre Flori
Changed branch from public/categories/improve_integer_mod_ring-15229 to 048aec7
On sage-devel, some discussion on
IntegerModRing
and its relation to categories took place. I would summarize the discussion and its consequences as follows:IntegerModRing
is prime, so that the ring is a field, without Sage performing a primality test. However (not part of the discussion)GF(n)
does a primality test (actually it factors n---is this really needed?). So, the user can not simply useGF(n)
if n is a huge number that the user knows to be a prime number.IntegerModRing(n, category=Fields())
. This feature was asked for in #8562 and implemented in #9138..is_field()
should do a primality test, unless it is already known that the ring is a field. It has been suggested in the discussion on sage-devel to let.is_field()
change the category of the ring. At the time of the discussion, this did not seem possible, but in #11900 this feature has been added.By #11900, refinement of category happens when it is tested whether the
IntegerModRing
is a field byR in Fields()
. However, there is some inconsistency:I think we would want that
R
andS
are in the same category after the category refinement ofS
.So, the first aim of this ticket is to make the categories consistent.
Secondly, comparison of
IntegerModRing
includes comparison of types. Originally, this has been introduced since one wantedGF(p)
andIntegerModRing(p)
evaluate differently. However, changing the category changes the type, and thus changes the ==-equivalence class. I think this must not happen. Hence, the second aim of this ticket is to make == independent of the category, while still lettingGF(p)
andIntegerModRing(p, category=Fields())
evaluate unequal.Thirdly, in the discussion on sage-devel, it was suggested to refine the category of R when
R.is_field()
returns true. Currently, refinement only happens whenR in Fields()
returns true. So, the third aim of this ticket to do the refinement as suggested.Cc to John Cremona, since it was he who brought the "category refinement in
is_field()
" topic up...CC: @JohnCremona @nthiery @jpflori @novoselt
Component: categories
Keywords: sd53 category integermodring
Author: Simon King
Branch/Commit:
048aec7
Reviewer: Jean-Pierre Flori
Issue created by migration from https://trac.sagemath.org/ticket/15229