sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.4k stars 473 forks source link

Sequence_generic needs to guard __iadd__, __imul__, __delitem__ if "immutable". #19813

Open orlitzky opened 8 years ago

orlitzky commented 8 years ago

I don't even:

sage: b = VectorSpace(QQ,3).basis()
sage: b += ["Hello, world!"]
sage: VectorSpace(QQ,3).basis()
[
(1, 0, 0),
(0, 1, 0),
(0, 0, 1),
'Hello, world!'
]

See also #19251.

CC: @mjungmath @tscrim

Component: linear algebra

Author: Matthias Koeppe, ...

Branch/Commit: u/mkoeppe/sequence_generic_needs_to_guard_iaddimuldelitem__delslice_ifimmutable @ 18bb870

Issue created by migration from https://trac.sagemath.org/ticket/19813

ac1b9e95-220c-4d26-9b7a-f6a99154963a commented 8 years ago
comment:1

It makes sense to me that a vector basis should be mutable. For example, what if you wanted to normalize the basis, order it or replace a basis vector? The resulting contents of the instance could still be a valid basis. There could also be some abstract vector spaces whose basis vectors make sense as strings.

nbruin commented 8 years ago
comment:2

No, this is indeed quite a bad bug, since VectorSpaces are UniqueRepresentation:

sage: V=VectorSpace(QQ,3)
sage: W=VectorSpace(QQ,3)
sage: b=V.basis()
sage: b+="a"
sage: len(W.basis())
4

Note that V and W are (to the user) obtained independently, so a change on one should not affect the other.

Also note that some effort is undertaken to make the thing immutable:

sage: b.append("a")
ValueError: object is immutable; please change a copy instead.
sage: type(b)
<class 'sage.structure.sequence.Sequence_generic'>
sage: b.is_mutable()
False
sage: type(b).__iadd__
<slot wrapper '__iadd__' of 'list' objects>

So it seems we just failed to shadow the "iadd" method.

nbruin commented 8 years ago
comment:3

Introspection seems to indicate: __iadd__, __imul__,__delitem__,__delslice__ should be intercepted. Gleaned from:

sage: T=sage.structure.sequence.Sequence_generic
sage: [m for m in dir(T) if str(getattr(T,m)).find("list") >=0]
6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago
comment:4

Looks like all Sage objects which define a '+' and can be immutable are failing. I tried matrices and graphs, and both have it.

Nathann

mkoeppe commented 3 years ago

Branch: u/mkoeppe/sequence_generic_needs_to_guard_iaddimuldelitem__delslice_ifimmutable

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Commit: 4fce73d

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

4fce73dSequence_generic.__imul__: New
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

d3bf0acSequence_generic.__delitem__: New
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 4fce73d to d3bf0ac

mkoeppe commented 3 years ago

Author: Matthias Koeppe

mkoeppe commented 3 years ago
comment:9

Replying to @nbruin:

Introspection seems to indicate: __iadd__, __imul__,__delitem__,__delslice__ should be intercepted.

Done (except for __delslice__, which does not exist any more in Python 3).

nbruin commented 3 years ago
comment:11

Note that Python's semantics for A += B are very close to A = A + B. The default is really that it is the latter (and NOT that A += B is an error). Some mutable types change the meaning to mutating behaviour instead of binding to a fresh object. Consider this:

sage: t1 = (1,2,3)
sage: t2 = t1
sage: t2 += (4,5,6) #this binds t2 to a new tuple
sage: t1
(1, 2, 3)
sage: t2
(1, 2, 3, 4, 5, 6)
sage: l1 = [1,2,3]
sage: l2 = l1
sage: l2 += [4,5,6] #this mutates the list that l2 is bound to.
sage: l1
[1, 2, 3, 4, 5, 6]
sage: l2
[1, 2, 3, 4, 5, 6]
orlitzky commented 3 years ago
comment:12

Replying to @nbruin:

Note that Python's semantics for A += B are very close to A = A + B. The default is really that it is the latter (and NOT that A += B is an error). Some mutable types change the meaning to mutating behaviour instead of binding to a fresh object.

Specifically, in this case, the python docs (https://docs.python.org/3/library/operator.html) say that x += y is equivalent to x = operator.iadd(x, y), and Sequence_generic is a subclass of list, which implements the in-place iadd via mutation.

I guess we should fix the default value of immutable too, one way or the other:

class Sequence_generic(sage.structure.sage_object.SageObject, list):
    """
    ...
        - ``immutable`` - (default: True) whether or not this sequence is           
          immutable
    ...
    """
    def __init__(self, x, universe=None, check=True, immutable=False,
                 cr=False, cr_str=None, use_sage_types=False):
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from d3bf0ac to 9ccc277

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

9ccc277Sequence_generic.__iadd__, __imul__: If immutable, just delegate to __add__/__mul__
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

607c316Sequence: Update doc of init arguments to match implementation
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 9ccc277 to 607c316

mkoeppe commented 3 years ago
comment:16

Thanks for the comments! Here's a better version

orlitzky commented 3 years ago
comment:17

The following sentence (with some fuzz) appears three times in sequence.py: "A mutable sequence of elements with a common guaranteed category, which can be set immutable." The universe is not what can be set immutable, though; it's the sequence. I would suggest something like "An (optionally mutable) sequence of elements with a common parent universe."

orlitzky commented 3 years ago
comment:18

The delegation behavior of iadd and imul is reasonable, but I think the docs should mention that they'll give you back a copy when immutable=True.

But before we address that... there's actually another bug that the description highlights: should these methods always return another sequence? And when they do, what should its universe be? The immutable behavior now returns a list, which I personally would not expect:

sage: s = Sequence([1,2], immutable=True)                                                                                                                                    
sage: s += [3]                                                                                                                                                               
sage: type(s)                                                                                                                                                                
<class 'list'>

The mutable behavior, on the other hand, can invalidate the universe:

sage: s = Sequence([ZZ(1),ZZ(2)])                                                                                                                                            
sage: s.universe()                                                                                                                                                           
Integer Ring
sage: s += [x]                                                                                                                                                               
sage: s                                                                                                                                                                      
[1, 2, x]
sage: s.universe()                                                                                                                                                           
Integer Ring

We should probably think about that before worrying about the wording of the docs.

mkoeppe commented 3 years ago
comment:20

Right, and also the following does not even work:

sage: M = Sequence([1, 2, 3], immutable=False) 
sage: M * 2
TypeError: unbound method SageObject.category() needs an argument
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 607c316 to 95334bb

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

95334bbSequence_generic.__add__/__iadd__: Always create a new immutable / mutable sequence
mkoeppe commented 3 years ago
comment:22

Here's another iteration for __add__, __iadd__...

orlitzky commented 3 years ago
comment:23

Replying to @mkoeppe:

Here's another iteration for __add__, __iadd__...

Looks good. Is list.add(self,other) any faster than list(self) + list(other)?

Something similar for multiplication would work. Finally, we should document that adding sequences will choose a new "universe" that encompasses both arguments, while multiplication will reuse the existing universe.

nbruin commented 3 years ago
comment:24

Replying to @orlitzky:

Something similar for multiplication would work. Finally, we should document that adding sequences will choose a new "universe" that encompasses both arguments,

We should probably specify how that common parent is constructed. I suspect it should be whatever coercion_maps(universe(seq1), universe(seq2)) comes up with. For one thing, it should not be driven by the parents of the elements (think of joining sequences of length 0).

It looks like the present version just rederives the universe from the concatenation of lists of elements. That's not what we need. Instead, we need something like:

L = Sequence([1/2,2,3])                                                     
M = Sequence([x,1])                                                         
u,v = coercion_model.coercion_maps(L.universe(),M.universe())               
import itertools                                                          
LplusM = Sequence(itertools.chain(map(u,L),map(v,M)),universe=u.codomain(),check=False)                          

(A reasonable defensive programming step might be to leave check True). The sole reason for existence of Sequences is that their universe manipulations can be derived from explicit information, rather than derive it from elements. The feature that Sequence allows the universe to be not specified explicitly is only for user convenience. Any systematic use should specify the universe explicitly. That will be much faster too, because the coercion framework doesn't have to scan through individual elements.

mkoeppe commented 3 years ago
comment:25

Replying to @orlitzky:

Is list.add(self,other) any faster than list(self) + list(other)?

There is no such method

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 95334bb to bb2407e

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

bb2407eSequence_generic: Remove python2-specific special methods
mkoeppe commented 3 years ago
comment:27

The Sequence constructor uses sage.structure.element.canonical_coercion on adjacent elements from left to right.

It could probably be simplified using coercion_model.common_parent, but the constructor's parameter use_sage_types=False and fallbacks for non-coercible inputs are a complication

nbruin commented 3 years ago
comment:28

Replying to @mkoeppe:

The Sequence constructor uses sage.structure.element.canonical_coercion on adjacent elements from left to right.

Yes, it does that if no universe is specified. However, if a universe is specified, then it just coerces the elements into the universe specified. There is no "common parent discovery" necessary in that case anymore.

I maintain that operations on sequences should not look at individual elements (since a sequence of length 0 does not have any), but derive the universe from the universes of the input sequences.

In particular, I think it's fine to have:

Sequence(list1) + Sequence(list2) != Sequence(list1+list2)

(although I'm pretty sure it will actually hold anyway in most/all reasonable cases)

It could probably be simplified using coercion_model.common_parent, but the constructor's parameter use_sage_types=False and fallbacks for non-coercible inputs are a complication

You should be able to tell those cases from the universes of the inputs. Perhaps if one of the input universes does not play nice with coercion_maps, other fallbacks are necessary (and perhaps just generate an error that no common parent can be found).

Given that previously these operations just generated a list (clearly not desirable), I don't think you need to be worried about backwards compatibility. So better start out with restrictive, well-defined behaviour. It'll become more relaxed and messy by incremental patching afterwards anyway.

mkoeppe commented 3 years ago
comment:29

Replying to @nbruin:

I maintain that operations on sequences should not look at individual elements (since a sequence of length 0 does not have any), but derive the universe from the universes of the input sequences.

Yes, I agree with you and have been working on the implementation.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from bb2407e to 2aac258

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

2aac258Sequence_generic.__add__, __iadd__: Compute new universe from old universes, not elements
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

ec0615bSequence_generic.__add__, __iadd__: Compute new universe from old universes, not elements (fixup)
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 2aac258 to ec0615b

mkoeppe commented 3 years ago
comment:32

Here's a new version that implements this. I haven't tested ill-behaved elements yet.

Next is to generalize the Sequence constructor so it can accept generators

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

84aae71src/sage/structure/sequence.py: Update copyright according to "git blame -w --date=format:%Y src/sage/structure/sequence.py | sort -k2"
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from ec0615b to 84aae71

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 84aae71 to 0c3f118

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

0c3f118Sequence_generic.__mul__, `__rmul__`, __imul__: Create sequences in the same universe, not lists
nbruin commented 3 years ago
comment:35

Given that you're using the coercion maps in a coercion environment, I don't think you need to copy them. Maps in the coercion system are special since they try to reference their domain and codomain weakly whenever possible, so that there's still a chance these are garbage collected (the maps are globally cached, so if this would not be done, any structure that participates in the coercion system would become immortal. Even with this hack, it's still often the case structures are unexpectedly prevented from being deleted due to the coercion system -- the implementation of maps often hold references to domain and/or codomain as well). In your case, you already know you're holding a reference to the domains.

I guess there may be a very small chance that the codomain is deleted just before you can get a hold of it (but the "copy" does not exclude that, because the codomain could vanish between the return of coercion_maps and the invocation of "copy"), so you could try and look if instead of coercion_maps, there is a routine that also returns the common parent (the codomain) explicitly with a strong reference. It looks like this low-probability event has been ignored and it hasn't led to big trouble (and in any case, your "copy" doesn't address it any better than just capturing the codomain explicitly). If you look in discover_coercion you see that Z=pushout(R,S) constructs the common parent, but that after that, internal coercion maps from R and S into Z are computed and that, upon return, the explicit reference to Z is dropped. So I suspect that coerce_R and coerce_S could be defunct once the return has completed, but I don't think it's ever been observed to happen.

In any case, watch out that None as a coercion map means identity map, so coercion_maps(ZZ,ZZ)==(None,None).

Yes, the coercion model is a nasty beast to work with and has been a source of endless memory leaks.

The warning about copying the maps internal to the coercion system is only for the case when someone wants to extract a morphism for a longer time and expects that holding on to the morphism will keep domain and codomain alive.

mkoeppe commented 3 years ago
comment:36

Replying to @nbruin:

watch out that None as a coercion map means identity map, so coercion_maps(ZZ,ZZ)==(None,None).

Yes, the code is handling that already

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

ef9a375Sequence_generic._add: We hold strong references to domain/codomain, no need to copy the coercion map
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 0c3f118 to ef9a375

mkoeppe commented 3 years ago
comment:38

Replying to @nbruin:

Given that you're using the coercion maps in a coercion environment, I don't think you need to copy them. [...]

Thanks for the explanation, I agree.

mkoeppe commented 3 years ago
comment:39

I think in the failing testcase of 3 * M, Integer.__mul__ calls coercion_model.bin_op, which seems to get confused by M being a Parent, not an Element

mkoeppe commented 3 years ago
comment:40

(3r * M works.)

mkoeppe commented 3 years ago

Changed author from Matthias Koeppe to Matthias Koeppe, ...

mkoeppe commented 3 years ago
comment:41

I think I need some help with this ticket