isi-vista / immutablecollections

A library for immutable collections, in the spirit of Guava's Immutable Collections.
MIT License
3 stars 2 forks source link

Improve ordering check for ImmutableSet creation #18

Open ConstantineLignos opened 5 years ago

ConstantineLignos commented 5 years ago

We currently do the following check for _require_ordered_input:

if (
            self._require_ordered_input
            and not (isinstance(items, Sequence) or isinstance(items, ImmutableSet))
            and not self._order_key
        ):
            raise ValueError(
                "Builder has require_ordered_input on, but provided collection "
                "is neither a sequence or another ImmutableSet.  A common cause "
                "of this is initializing an ImmutableSet from a set literal; "
                "prefer to initialize from a list instead to help preserve "
                "determinism."
            )

In #12, @rgabbard and I discussed some of the challenges in determining whether the input is ordered. We decided to change the flag to disable_order_check, and have the order check look for specific bad classes as sources. Specifically:

  1. A set that is not an ImmutableSet
  2. Dictionaries or dictionary views that come from unordered dictionaries.

Regarding 2, @rgabbard said before:

Note in particular that things like itertools.chain don't return generators, which blocks us from whitelisting Generator to cover all these cases. So we decided to switch to a blacklisting approach where we block AbstractSet-derived classes which are not ImmutableSet and also KeysView, ValuesView, and ItemsView coming from built-in dicts on Python versions/implementations not guaranteeing deterministic iteration ordering. This should hit most of the common cases. We will allow disabling this check via a disable_order_check flag.

For KeysView, etc., we propose a hack: at module initialization we will make a dummy dict and cache the class of the returned views. We will then use these in the checks. These will be deterministic for Python 3.7+ and on Python 3.6 with the CPython implementation. This may be a little brittle, but since you can always override it we don't think it is too big of a problem.

@rgabbard Anything you want to add?

gabbard commented 5 years ago

Nope, this looks good. Thanks for splitting this out to its own issue