Open Eric-Arellano opened 2 years ago
I've started to look into this one.
Am I wrong thinking that __eq__
for FrozenDict
and {Frozen,}OrderedSet
can simply be self._data == other._data
and self._items == other._items
respectively since Python supports dict comparisons natively?
FrozenDict
also implements __lt__
, and <
on dicts isn't supported - so if that needs to be optimized as well we need another solution for that. Counter(self.items()) < Counter(other.items())
seems consistent regardless of key order, so maybe that's an option.
I've started to look into this one.
🎉
Am I wrong thinking that eq for FrozenDict and {Frozen,}OrderedSet can simply be self._data == other._data and self._items == other._items respectively since Python supports dict comparisons natively?
Not at all wrong. That's I think what we should do.
FrozenDict also implements lt
Hm I don't remember why we did this. I think so that we can sort dataclasses that compose FrozenDicts. But maybe that's not necessary? I'd recommend:
__lt__
, even if not optimal__eq__
__lt__
semantics and see how CI does. Possibly get rid of __lt__
Thinking more about this...order should probably matter for FrozenOrderedSet
! It's in the name! Instead, we should be using frozenset
rather than FrozenOrderedSet
most of the time.
Ordering should not matter for FrozenDict
.
Instead, we should be using
frozenset
rather thanFrozenOrderedSet
most of the time.
Maybe? See https://github.com/pantsbuild/pants/issues/14195#issuecomment-1114087237 re: set
s.
If something needs to iterate over a set, it should use FrozenOrderedSet
. And assuming that the inputs to that FrozenOrderedSet
are deterministic (because they themselves were not produced by iterating over non-Ordered set
s), then you're good to not sort the inputs.
As the author of both
FrozenDict
andFrozenOrderedSet
, I think I was misguided in thinking that deterministic ordering actually matters for the engine. Per https://github.com/pantsbuild/pants/pull/14715#discussion_r820997077, I believe that all we need is a deterministic__eq__
and__hash__
- the underlying ordering of the elements should be irrelevant and abstracted from the engine.The only place I think ordering should matter is things like error messages, if we want to alphabetize for example. But we can call
sorted()
at the callsites.--
Because both
FrozenDict
andFrozenOrderedSet
consider the order of elements in their__eq__
, we have to eagerly callsort
in lots of places, which is wasted computation. For example:https://github.com/pantsbuild/pants/blob/94921a8edb9b142dff471784aef9da22b58cd0ab/src/python/pants/backend/python/dependency_inference/module_mapper.py#L136-L139
We must still use
FrozenDict
rather thandict
so that it's hashable, but I don't think we need to care anymore about ordering.I wonder if we even need
OrderedSet
andFrozenOrderedSet
...can we useset
andfrozenset
? Even if we keep those classes, can we stop using them use much and do the simpler + faster thing of stdlib?--
To implement this change, I recommend:
__eq__
implementations forFrozenDict
and{Frozen,}OrderedSet
sorted()
and the class propertysort_input
, remove when possible.