isi-vista / immutablecollections

A library for immutable collections, in the spirit of Guava's Immutable Collections.
MIT License
3 stars 2 forks source link

module-level factory functions #24

Closed gabbard closed 5 years ago

gabbard commented 5 years ago

Partially addresses #12

gabbard commented 5 years ago

This is a start on #12 . A few notes:

gabbard commented 5 years ago

@nicomitchell : can you add tests to this for the order check case of AbstractSet?

ConstantineLignos commented 5 years ago

Another general comment on something that isn't in the diff so I can't leave a review: can we change iteration_order inside _FrozenSetBackedImmutableSet to just be Sequence[T] and not do any conversion?

Since it's effectively a private constructor, we can trust that the factory method is never sharing the iteration order with the caller, even if it is mutable.

Current implementation is:

    _iteration_order: immutablelist.ImmutableList[T] = attrib(
        converter=immutablelist.ImmutableList.of, cmp=False, hash=False  # type: ignore
    )

EDIT: The downside of this is that as_list can no longer just return the iteration order as-is. Since ImmutableSet[T] is now a Sequence[T], I have no problem with making as_list expensive.

gabbard commented 5 years ago

@ConstantineLignos : actually, is there any reason to have as_list() now that we just implement Sequence? I think it was added before that was the case

gabbard commented 5 years ago

@ConstantineLignos : I think an event nicer option is for ImmutableSet to just sub-class tuple and skip having a separate iteration_order field (instead passing the order to super). This may cause a slight slowdown on creation, but should save time on access and iteration.

gabbard commented 5 years ago

(but this should be checked using benchmarks)

ConstantineLignos commented 5 years ago

I'm fine with killing off as_list. If subclassing tuple makes sense, go for it. I'd like to explore the idea of a dict-backed implementation (see https://github.com/isi-vista/immutablecollections/issues/14#issuecomment-457265569) at the same time as when exploring subclassing tuple. The dict-backed one will should have the advantage of having the backing dict handle both containment and order in the dict itself, so the factory method would be different for maximum efficiency.

(Just in case anyone is wondering, the Python 3.6 dict implementation keeps the order of first insertion as you might expect, just like ImmutableSet, see below.)

>>> list({item: None for item in [4, 3, 2, 1, 4]})
[4, 3, 2, 1]
gabbard commented 5 years ago

@ConstantineLignos : I experimented a little with using a dict in the factory method:

unique_values = tuple({ val : None for val in iterable}.keys())
return _FrozenSetBackedImmutableSet(
                frozenset(unique_values), unique_values)
...

In increases the time to initialize from a 3-element collection by ~5% but doubles performance on a 10k-element collection, so I think it is a win.

One challenge with using the dict to actually back the set is that you still need the tuple for iteration order because the Dict's KeyView just extends Collection, not Sequence. So the tradeoff becomes the time to initialize the frozenset for containment checks vs just keeping the dict vs. the (probable) increased memory usage of dict

ConstantineLignos commented 5 years ago

@rgabbard Is there anything we'd need to do before merging if we stick with the frozenset backing right now?

gabbard commented 5 years ago

@ConstantineLignos : I have some local changes on this branch to sync up with this before merging. Poke me if I haven't taken care of it by tomorrow morning

ConstantineLignos commented 5 years ago

Sounds good. I'm on vacation at the start of next week so I'd like to get this and https://github.com/isi-vista/vistautils/issues/44 sorted this week before I go.

gabbard commented 5 years ago

I'm all in favor of ditching the type checking (which I think also means, on a future PR, ditching ImmutableList, which no longer offers any advantage over tuple). I'll drop the TODO comments related to it.

codecov[bot] commented 5 years ago

Codecov Report

Merging #24 into master will decrease coverage by 2.16%. The diff coverage is 85.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #24      +/-   ##
==========================================
- Coverage   91.82%   89.66%   -2.17%     
==========================================
  Files           7        7              
  Lines         575      619      +44     
==========================================
+ Hits          528      555      +27     
- Misses         47       64      +17
Impacted Files Coverage Δ
immutablecollections/_utils.py 100% <100%> (ø)
immutablecollections/_immutabledict.py 83.87% <80%> (ø)
immutablecollections/_immutablelist.py 90.16% <83.33%> (ø)
immutablecollections/_immutablemultidict.py 91.98% <85%> (ø)
immutablecollections/_immutableset.py 88.57% <88%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d861bd6...21e9a02. Read the comment docs.

gabbard commented 5 years ago

I'm ignoring codecov's comments about test coverage for now because it is largely due to me switching tests to use the new factory methods, leaving parts of the builder and .of methods untested. This code is slated for deprecation shortly, so I'm not worried about it.