isi-vista / immutablecollections

A library for immutable collections, in the spirit of Guava's Immutable Collections.
MIT License
3 stars 2 forks source link

Simplify creation of immutable collections #12

Closed ConstantineLignos closed 5 years ago

ConstantineLignos commented 5 years ago

We currently have ImmutableSet.of(iterable, ...), so when creating an ImmutableSet you have to write things like ImmutableSet.of([1, 2, 3]) where you create the iterable (which must not be a set if you want the ordering to be reliable).

I originally created the above API to move away from the Guava approach which I found confusing. In Guava-land, you effectively have a *args version with ImmutableSet.of(<varags>) and ImmutableSet.copyOf(<iterable>). The latter can be confusing to the user because it is not necessarily the case that a copy is made, and they might (as I did at first) shy away from ImmutableSet.copyOf(X) where X may already be an ImmutableSet.

I do find that in terms of usability, having to write the extra [], (), or {} when you are creating constants is a bit of a blemish on the API design and can also cause some readability issues with parens when you have ImmutableList.of((1, 2, 3)) due to double parens. (In ELICIT, in fact most all of our usage of the immutable collections is as constants, so we are always mildly annoyed by these extra characters.) So I'd like to revisit the decision I made early on with the following proposal designed for ImmutableSet and ImmutableList (possibly other collections if it makes sense, I'll just write Set everywhere below):

  1. ImmutableSet.of(*args): create an ImmutableSet from args. Note that since the output is always ordered, you are guaranteed to not get in trouble with the intuitive but bad idea ImmutableSet.of({1, 2, 3}) which will not preserve order.
  2. ImmutableSet.from(iterable) returns an ImmutableSet made from the iterable. It's also a good time to bite the bullet and make require_ordered_input default to True for this method. I think there's no better time to address that than when making another breaking API change.

I'm very open to changes to the of/from terminology, the point is really just the two ways of creation and that neither has "copy" in the name.

Some benefits to note:

@rgabbard What do you think?

gabbard commented 5 years ago

I am in favor of the general idea, but with different terminology.

I am uneasy about using of just because we have so much existing code using the current form of .of with the Iterable argument - this code will "silently" become wrong after the proposed change (list containing the contents of the Iterable --> list containing the Iterable itself as a single element). I guess it's a question of how much we trust mypy to catch things.

Possible alternate names for the *args version: of_elements, from_elements. Another option would be to mirror frozenset and have a standalone immutableset(*args) method.

from is okay, though the distinction between from and of is probably not very transparent to the user. We could use the the more explicit but more verbose from_iterable.

@berquist , any thoughts on this?

ConstantineLignos commented 5 years ago

Regarding the problem with changing the behavior of of: I'm sympathetic to the concern regarding existing code. If we decide that of is the right name going forward, I'd like to do a multi-stage change where we deprecate and reintroduce so it's really hard to break existing code. But we can cross that bridge when we come to it.

Another idea: immutableSet.constant(*args) and ImmutableSet.from(iterable)? I think "constant" is not quite the right name but it gets the idea across.

We can also do from/from_iterable of of/of_iterable (but the latter redefines of, which is understandably objectionable).

gabbard commented 5 years ago

@ConstantineLignos : As long as we deprecate-and-reintroduce, I am ok with of. constant is tempting but a little odd because of course the set doesn't have to be deeply immutable, which I usually think of constants as being. literal is kind of tempting, because it would be used like set literals, but of course it isn't literally a literal.

I'd prefer to avoid of_iterable because of has to me the connotation of "this is what this is composed of", so the user might expect to get { myIterable}.

I wonder if this is the sort of thing where we should come up with some options and then poll all the Python programmers in the office.

berquist commented 5 years ago

ImmutableSet.of({1, 2, 3}) which will not preserve order

You could refuse to take a plain set (but not its subclasses) as input.

It's also a good time to bite the bullet and make require_ordered_input default to True for this method.

+1, this is the above.

We could use the the more explicit but more verbose from_iterable

"Explicit is better than implicit." I think of is not a good name, but if you originally inherited from Guava terminology it can't be helped. of and from_iterable get my votes.

Another option would be to mirror frozenset and have a standalone immutableset(*args) method. (...) I wonder if this is the sort of thing where we should come up with some options and then poll all the Python programmers in the office.

This is the most Pythonic approach mentioned, rather than static methods. For others with a non-Java background, it's what we expect from a more minimal Python API (rather than something like Django or SQLAlchemy). Then you run into the danger of mixing API styles, and consistency is more important.

ConstantineLignos commented 5 years ago

Following on the more pythonic route for a moment, it would make sense to have ImmutableSet(iterable), which would be just like set, frozenset, etc. (or Counter as an example of a non-core datatype in the stdlib which initializes this way). This also means you can use the class name as a converter with attrs. I really like this approach and it nicely brings us out of a Java-like API.

If we were to stop there, we'd be no worse than any other standard Python datatype. So I do think the *args/**kwargs version would more or less be a cherry on top. It's a nice way for creating constants/default values, and saves you some typing. I can't think of a good name for it right now. I'm also about to go on vacation :).

So let me stop there: @berquist and @rgabbard, would you both support ImmutableX(iterable) as the primary way of creating immutables, effectively replacing of? (I'm not in any particular rush to deprecate of; it can stay a while.) I think this means we need to mess with __new__ in the classes to direct to the actual implementation constructors, but I've never done this kind of thing before.

ConstantineLignos commented 5 years ago

One last dump of ideas about the name of the *args/**kwargs version:elements/with_elements, contents/with_contents? But anyway, we can backburner this new feature since it's an enhancement.

gabbard commented 5 years ago

Would it be better to have ImmutableSet(iterable) or immutableset(iterable)? The casing for the latter seems more Pythonic, but then it doesn't match the class name. On the other hand, in normal usage, the user should rarely care about the class name. The lowercase version would remove the need to go the __new__ route - I don't know if that has overhead or not.

Hmm - another question about __new__:

If __new__() returns an instance of cls, then the new instance’s __init__() method will be invoked like __init__(self[, ...]), where self is the new instance and the remaining arguments are the same as were passed to __new__().

Would this prevent us from simply returning the input unchanged when it is already an immutable set?

ImmutableDict initialization

Note that here the kw_args initialization is a bit more than a cherry on top because kw_args dicts are guaranteed to be ordered by the 3.6 spec, but the order of other dictionaries is ordered only as an implementation detail of CPython. So if we want any key-value-ish initialization, we have to provide kw_args (otherwise you have to make an iterable of key-value tuples, which is kind of ugly).

gabbard commented 5 years ago

Ah, ok, it looks like dict ordering becomes part of the standard in 3.7, so it's sort of a question of what versions we want to support.

gabbard commented 5 years ago

On the other hand, in normal usage, the user should rarely care about the class name.

Oh, nevermind, type annotations. :-)

berquist commented 5 years ago

Not sure if this is useful, but I am trying to actually use the library for the first time, and wanted to share my thought process after reading the tests and then skimming the code.

rows = [('a', 'b'), ('a', 'c'), ('c', 'd')]
print(ImmutableListMultiDict.of({row[0]: row[1] for row in rows}))

This, of course, doesn't work, because the intermediate dictionary loses all your information. Then how is an instance constructed? I saw something about a builder, but that probably means it can't be constructed with a one-liner, I only saw put and put_all. Oh, there is put_all_items:

b = ImmutableListMultiDict.builder()
b.put_all_items(rows)
x = b.build()

but this reads like Java, not Python. Everything could be chained together (like on https://github.com/isi-vista/immutablecollections/blob/master/immutablecollections/immutablemultidict.py#L164). Wait, this works, if only I had read the docstring the first time:

y = ImmutableListMultiDict.of(rows)

and this of course doesn't work:

z = ImmutableListMultiDict(rows)
Traceback (most recent call last):
  File "/Users/berquist/.conda/envs/aida_tools/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-12-4cc7d234f494>", line 1, in <module>
    z = ImmutableListMultiDict(rows)
  File "/Users/berquist/.conda/envs/aida_tools/lib/python3.6/typing.py", line 1231, in __new__
    return _generic_new(cls.__next_in_mro__, cls, *args, **kwds)
  File "/Users/berquist/.conda/envs/aida_tools/lib/python3.6/typing.py", line 1188, in _generic_new
    return base_cls.__new__(cls, *args, **kwds)
TypeError: object() takes no parameters

Maybe this is just a case of not reading the documentation well enough, but I rarely think about static methods when writing Python; for better or worse, constructing the object directly via __init__ is most intuitive to me.

gabbard commented 5 years ago

@ConstantineLignos : We could consider raising an exception if someone passes a dict to .of pointing them in the direction of the right method of construction. For when someone really needs to make an Immutable[Multi]Dict from a true dict, we could offer a static factory method like .create_from_dict_losing_determinism to make it really obvious you don't want to do this unless you have to.

@berquist : Thanks for the comments! I know "create-by-calling-constructor" is common in the Python world, but I (personally) think it is a terrible habit. The modern Java habit of keeping constructors private and using static factory methods almost always leads to clearer, more flexible, and more testable code. (I just this morning ran into a case where I'm going to need to rewrite code in a Python library because it needlessly does initialization from a file passed to the constructor when I instead need to initialize it directly with information from another source. Argh.).

That said, module-level factory functions are a decent, more Pythonic alternative to static factory methods (why they are considered superior in Python-land, with their decreased discoverability compared to static factory methods, is beyond me, however. I guess the less something looks like Java, the more Pythonic it is? </rant>). Would having something like:

y = immutablelistmultidict(rows)

be more Pythonic-ly intuitive?

ConstantineLignos commented 5 years ago

So after another slack chat with @rgabbard I'm leaning toward a design with the goal of matching the built-in collection types as much as possible. Thus the default way to build things is using a lowercase function name (replacing of after a deprecation window) like immutableset, and for type annotations, one has to refer to the abstract class, like ImmutableSet (compare to set() and typing.Set).

Here's a sketch:

from abc import ABCMeta
from typing import (
    Optional,
    Type,
    Iterable,
    TypeVar,
    Generic,
    AbstractSet,
    Sequence,
    Mapping,
    Tuple,
    Union,
)

from immutablecollections import ImmutableCollection

T = TypeVar("T")
KT = TypeVar("KT")
VT = TypeVar("VT")

class ImmutableSet(
    Generic[T], ImmutableCollection[T], AbstractSet[T], Sequence[T], metaclass=ABCMeta
):
    pass

class ImmutableDict(ImmutableCollection[KT], Mapping[KT, VT], metaclass=ABCMeta):
    pass

class ImmutableList(
    Generic[T], ImmutableCollection[T], Sequence[T], Iterable[T], metaclass=ABCMeta
):
    pass

# Note the changed default for require_ordered_input. Now is the best opportunity to change this!
def immutableset(
    iterable: Iterable[T],
    check_top_type_matches: Optional[Type] = None,
    *,
    require_ordered_input: bool = True,
) -> ImmutableSet[T]:
    pass

def immutabledict(
    iterable: Union[Mapping[KT, VT], Iterable[Tuple[KT, VT]]],
    check_key_type_matches: Optional[Type] = None,
    check_value_type_matches: Optional[Type] = None,
) -> ImmutableDict[KT, VT]:
    pass

def immutablelist(
    iterable: Iterable[T], check_top_type_matches: Optional[Type] = None
) -> ImmutableList[T]:
    pass

(The inheritance on each class reflects the current implementation, so let's not dig into it right now.)

A few things I'd like to discuss:

  1. I think we should continue to make the builder pattern available, as there are a lot of scenarios where it makes sense. I just want to provide an easily-discoverable simple path.
  2. This brings {top,key,value}_type_matches to everything (see #1). However, I don't think we need these factories to replicate every feature in the builder. For example, order_key in ImmutableSet makes much more sense in the builder pattern.
  3. For ImmutableSet, I'm not sure exactly what to do about require_ordered_input. I think now is our best chance to change the default to True if we want to go that way. However, the current implementation checks for Sequence or ImmutableSet as good, which rules out the pretty common case of an ordered iterable, say a generator over file lines, which is ordered but isn't a sequence. I wonder how we might re-think this check overall. We could change the check to only check for high-risk things like things that are sets but not ImmutableSet, but that hardly meets the notion of "require that the input be ordered." The current check is imperfect but reliable, even though I don't think it fully matches the name. Any thoughts @rgabbard ?
  4. None of this solves my original desire for *args/**kwargs versions. I'm just not sure what to do about those. I do feel strongly that the above is a good core API that we can build around as needed.
gabbard commented 5 years ago

re: 4 - note that the built-in frozen set doesn't support *args either, so we aren't any worse off:

frozenset('a', 'b', 'c')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: frozenset expected at most 1 arguments, got 3
gabbard commented 5 years ago

Overall, I like this. A few comments:

Another (orthogonal) option is to put Generator on the list of allowed types. While you can have a non-deterministic generator, it is no worse than a non-deterministic list comprehension, which we already can't catch.

gabbard commented 5 years ago

Summary of decisions from Slack conversation with @ConstantineLignos :

Source: <generator object at 0x106d1e360> Type: <class 'generator'> Is a generator? True Error creating set: Builder has require_ordered_input on, but provided collection is neither a sequence or another ImmutableSet. A common cause of this is initializing an ImmutableSet from a set literal; prefer to initialize from a list instead to help preserve determinism.

Source: dict_keys(['foo']) Type: <class 'dict_keys'> Is a generator? False Error creating set: Builder has require_ordered_input on, but provided collection is neither a sequence or another ImmutableSet. A common cause of this is initializing an ImmutableSet from a set literal; prefer to initialize from a list instead to help preserve determinism.

Source: range(0, 3) Type: <class 'range'> Is a generator? False

Source: <itertools.chain object at 0x1068920b8> Type: <class 'itertools.chain'> Is a generator? False Error creating set: Builder has require_ordered_input on, but provided collection is neither a sequence or another ImmutableSet. A common cause of this is initializing an ImmutableSet from a set literal; prefer to initialize from a list instead to help preserve determinism.


Note in particular that things like `itertools.chain` don't return generators, which blocks us from whitelisting `Generator` to cover all these cases.  So we decided to switch to a blacklisting approach where we block `AbstractSet`-derived classes which are not `ImmutableSet` and also `KeysView`, `ValuesView`, and `ItemsView` coming from built-in dicts on Python versions/implementations not guaranteeing deterministic iteration ordering.  This should hit most of the common cases.  We will allow disabling this check via a `disable_order_check` flag.

For `KeysView`, etc., we propose a hack: at module initialization we will make a dummy dict and cache the class of the returned views. We will then use these in the checks.  These will be deterministic for Python 3.7+ and on Python 3.6 with the CPython implementation.  This may be a little brittle, but since you can always override it we don't think it is too big of a problem.
ConstantineLignos commented 5 years ago

Here's the updated code to match what @rgabbard and I worked out:

def immutableset(
    iterable: Optional[Iterable[T]] = None,
    check_isinstance: Optional[Type] = None,
    *,
    disable_order_check: bool = False,
) -> ImmutableSet[T]:
    pass

def immutabledict(
    iterable: Optional[Union[Mapping[KT, VT], Iterable[Tuple[KT, VT]]]] = None,
    check_key_isinstance: Optional[Type] = None,
    check_value_isinstance: Optional[Type] = None,
) -> ImmutableDict[KT, VT]:
    pass

def immutablelist(
    iterable: Iterable[T] = None, check_isinstance: Optional[Type] = None
) -> ImmutableList[T]:
    pass

18 covers sorting out the implementation of the order check for ImmutableSet.