sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.35k stars 460 forks source link

Parent.list() should return an immutable tuple #20743

Open johanrosenkilde opened 8 years ago

johanrosenkilde commented 8 years ago

The current behaviour of Parent.list() is the road to insanity:

sage: F = GF(5)
sage: F.list()
[0, 1, 2, 3, 4]
sage: F.list().remove(0)
sage: F.list()
[1, 2, 3, 4] 

The reason is that the output of list is cached, and a list is mutable, and so the user can permanently modify it. This easily happens without wanting to.

The closest Python concept of an "immutable list" is a tuple.

CC: @vbraun @jdemeyer @tscrim @nthiery

Component: categories

Keywords: parent, immutable

Issue created by migration from https://trac.sagemath.org/ticket/20743

vbraun commented 8 years ago
comment:1

+1 to returning a tuple

kedlaya commented 8 years ago
comment:2

-1 to returning a tuple. foo.list() should return the same as list(foo), which is always a list, and foo.tuple() should return the same as tuple(foo), which is always a tuple. And for that matter, foo.str() returns the same as str(foo) which is always a string, et cetera.

nbruin commented 8 years ago
comment:3

Replying to @kedlaya:

-1 to returning a tuple. foo.list() should return the same as list(foo), which is always a list, and foo.tuple() should return the same as tuple(foo), which is always a tuple. And for that matter, foo.str() returns the same as str(foo) which is always a string, et cetera.

This sounds quite reasonable, but the implementation details are quite different: list(...) and tuple(...) simply investigate the argument for being iterable and if it is, exhaust the iterator and put the results in a tuple/list. On the other hand, foo.list() and/or foo.tuple() ask the object to return a list and/or tuple. It can do this much more efficiently when it's cached. The "safe" option is to return a copy, but that just means incurring the cost of making a copy immediately.

The story for str is different because that investigates if the object implements the __str__() protocol, which is a much more straightforward translation (and strings are immutable), so I'll leave that out of consideration here.

I think there is definitely a place for giving efficient access to the constituents of an object in the form of an indexable structure (tuple or list). It seems that it's a bug trap to do this in the form of a mutable data structure (probably because the similarity to list(foo), where a fresh list gets constructed automatically), so perhaps a tuple is a better fit.

The method for this has traditionally been called foo.list() in sage. Changing this to foo.tuple() is perhaps a little heavy-handed. I think no-one would be looking for this method unless they already now it's what you should look for. Are there other names that are more evocative?

In python3, dictionaries have readonly "views" for their keys,values, and items (although they don't admit indexing). Perhaps the thing we need is something that provides an "indexable view" of the contents of an object. Tuples would work. Lists would too, but are mutable.

I'm not sure how important it is that we specify the exact type that needs to be returned. In pure python this shouldn't be important at all, but in cython you get much more efficient code if something is known to be a tuple or a list.

vbraun commented 8 years ago
comment:5

Types (class names) are nouns:

Methods are verbs

Its perhaps unfortunate that there is no greater selection of appropriate verbs in English, but that linguistic details shouldn't be cause to have some ridiculous bug in Sage.

nthiery commented 8 years ago
comment:6

Definitely +1 on returning something immutable, and a tuple is the natural choice; I have been desiring it for a while. The naming issue is somewhat annoying, but that's life, I guess. The only thing that worries me is that this is going to break backward compatibility, and I am not sure there is anything we can do ...

jdemeyer commented 8 years ago
comment:7

Alternative idea: create a new type immutable_list inheriting from list. This has the advantage of being backwards compatible: isinstance(foo, list) would still be true and the printing would be like a list. This works and behaves like a list:

cdef class immutable_list(list):
    def __setitem__(self, i, value):
        raise TypeError

Of course, you also need to disallow .append(), +=, __delitem__, ... but the basic idea works.

nthiery commented 8 years ago
comment:8

Interesting suggestion. On top of that, we could possibly enable addition with tuples, and raise a warning in case of addition with a list (it's probably not possible to catch []+P.list() though), to pave the path for later changing the output to a plain tuple so that we don't keep around yet another list-like object.

jdemeyer commented 8 years ago
comment:9

Replying to @nthiery:

Interesting suggestion.

By the way, I'm not totally convinced myself that it's the best solution. It is the best solution to the practical problem of backwards-compatibility. But it does introduce some clutter: yet another list-like type.

kedlaya commented 8 years ago
comment:10

Replying to @jdemeyer:

Replying to @nthiery:

Interesting suggestion.

By the way, I'm not totally convinced myself that it's the best solution. It is the best solution to the practical problem of backwards-compatibility. But it does introduce some clutter: yet another list-like type.

But it might be unavoidable: tuples must consist of immutable objects, so it may not always be possible to return one. Another alternative would be for .list() to return a copy of the cached list, but I'm not sure this is better than jdemeyer's suggestion (it is worse from a performance point of view, since the copy is unnecessary if you don't actually attempt to modify the list).

kedlaya commented 8 years ago
comment:11

Replying to @vbraun:

Types (class names) are nouns:

  • list
  • tuple

Methods are verbs

  • list

Its perhaps unfortunate that there is no greater selection of appropriate verbs in English, but that linguistic details shouldn't be cause to have some ridiculous bug in Sage.

A quick look at the Python docs (https://docs.python.org/3/library/stdtypes.html) shows that the following are all methods of basic types:

math.floor()
math.ceil()
int.bit_length()
float.hex()
list.index()
str.title()
str.upper()
dict.keys()
dict.values()
dict.items()

This is definitely a minority of the methods one finds on that page, but it makes the point that insisting that method names be verbs is a good design principle, not an absolute rule; it definitely gets overridden when there is no simple alternative.

For that matter, sometimes one can't even tell: is foo.count() a noun or a verb? It would be pretty un-Pythonic (and perhaps unfair to non-English speakers) to have list mean completely different things when interpreted as a noun vs. a verb, in addition to confusing the defined terms list and tuple.

nbruin commented 8 years ago
comment:12

Replying to @kedlaya:

But it might be unavoidable: tuples must consist of immutable objects, so it may not always be possible to return one.

That is not true:

sage: A=([1],[2])
sage: A[0].append(3)
sage: A
([1, 3], [2])

That said, I don't think foo.tuple() would be discoverable at all, so think that's a very bad name for implementing this functionality.

I think it is essential that foo.list() continues to provide a way to indexably access constituents of an object (although it would be interesting to have data on how much code depends on that). If returning a list for that is too dangerous, I'd be fine with a tuple instead.

If it has to be a list then that would be fine with me. If it's too difficult to let people use copy when they need to, we may just have to add an option. Something like:

sage: foo.list()
<returns a copied list>
sage: foo.list(tuple=true)
<returns the cached/stored tuple>
sage: foo.list(copy=false)
<returns the cached/stored list>

Concerning verbs/nouns: English is a pretty flexible language. You can verb any noun in it, so without grammatical context you can't really tell what it is anyway. If we want to remove the ambiguity, we should probably rename the method to foo.listify().

In fact, I always thought that desctructive method would be verbs (and return None) and methods without side effects would be nouns, as in L.sort() versus L.sorted() (except that python, having a decided preference for routines with side effects, doesn't have the latter)

Note that a common design paradigm in python is "we are all consenting adults here" (an argument I haven't seen brought up against using python for pre-university schooling yet). From that perspective, returning a mutable list (perhaps because it must be mutable internally) that shouldn't be mutated isn't an absolute no-go.

nbruin commented 8 years ago
comment:13

Replying to @kedlaya:

math.floor()
math.ceil()
float.hex()
list.index()
str.title()
str.upper()

This sublist could be all verbs? They are definitely funnier that way.

kedlaya commented 8 years ago
comment:14

Replying to @nbruin:

Replying to @kedlaya:

But it might be unavoidable: tuples must consist of immutable objects, so it may not always be possible to return one.

That is not true:

sage: A=([1],[2])
sage: A[0].append(3)
sage: A
([1, 3], [2])

Yes, you're right. The only time it matters whether a member of a tuple is mutable is when the tuple is hashed.

So that means that one could cache a tuple and have foo.list() return a list copy of that tuple, as long as there is no actual legitimate reason to modify the cached list (I certainly can't think of one). In a performance-critical situation, one shouldn't be asking for either a list or a tuple anyway, but rather an iterator.

nbruin commented 8 years ago
comment:15

Replying to @kedlaya:

So that means that one could cache a tuple and have foo.list() return a list copy of that tuple, as long as there is no actual legitimate reason to modify the cached list (I certainly can't think of one). In a performance-critical situation, one shouldn't be asking for either a list or a tuple anyway, but rather an iterator.

No, an iterator provides a different kind of interface: if you need indexing an iterator will just not do. Even for iteration an iterator might not be the lowest overhead: you're incurring python call overhead on each access. List/tuple items can be retrieved more efficiently than that (provided the entries have to be in memory anyway). This will be a tiny difference, though.

kedlaya commented 8 years ago
comment:16

Replying to @nbruin:

Replying to @kedlaya:

So that means that one could cache a tuple and have foo.list() return a list copy of that tuple, as long as there is no actual legitimate reason to modify the cached list (I certainly can't think of one). In a performance-critical situation, one shouldn't be asking for either a list or a tuple anyway, but rather an iterator.

No, an iterator provides a different kind of interface: if you need indexing an iterator will just not do. Even for iteration an iterator might not be the lowest overhead: you're incurring python call overhead on each access. List/tuple items can be retrieved more efficiently than that (provided the entries have to be in memory anyway). This will be a tiny difference, though.

If the cached thing ends up being a tuple, then there's no reason not to implement foo.tuple() returning a reference to the cached tuple. If there is a genuine concern that this is "not discoverable" (which I'm not sure there is for users that know the difference between a list and a tuple), we can modify the docstring for foo.list() to say SEEALSO: foo.tuple().

kedlaya commented 8 years ago
comment:17

On a different point, there is also a Python design principle that "tuples are for heterogeneous data, lists are for homogeneous data". In our context, Parent.list() will be the various members of some mathematical structure, which definitely constitute homogeneous data (they will all be of the same class, for one).

That said, all design principles are secondary to having things behave correctly...

vbraun commented 8 years ago
comment:18

I don't think that "tuples are for heterogeneous data, lists are for homogeneous data" is a Python-specific principle at all. Obviously you want heterogeneous data in tuples, whenever the index has some type/semantic meaning it doesn't make sense to add a new element at the front. But for homogeneous data both list and tuple make sense.

kwankyu commented 8 years ago
comment:19

Trying to fix the bug, I did something else in #20902, which also fixes somehow the bug in the current ticket. Please check.

kwankyu commented 8 years ago
comment:20

Perhaps I need to clarify how #20902 fixes the issue. With the patch,

sage: F=GF(5)
sage: F.list()
[0, 1, 2, 3, 4]
sage: F.list().remove(0)
sage: F.list()
[0, 1, 2, 3, 4]
sage: F.list.__module__
'sage.categories.finite_enumerated_sets' 
kwankyu commented 7 years ago
comment:21

As #20902 was merged to Sage 7.5.beta2, the bug reported in this ticket is now resolved. The issue of what should be the right type for the return value of the method .list() still remains but is controversial. So I suggest to close this ticket and open another when we have a consensus on the issue. If we keep this ticket, then its description needs to be changed at least.

tscrim commented 7 years ago
comment:22

We can just keep this around for right now to note the discussion and the issue.