Closed duynhaaa closed 2 years ago
Nice, thanks!
I think I totally agree that HWAddress classes should be hashable.
Before merging this I want to think through the details, a lot of little choices with possibly important ripples in here... I might conclude that we should change exactly what we hash on. I'll share more of my thoughts on it when I have time.
Sure thing, just let me know if there's anything else that needs to be done.
Yeah I'll let you know what I want changed as soon as I know. 👍
Right now my gut feel is to either hash on (self.size, self._address)
or (type(self), self._address)
, but I probably won't have time to write out the reasoning for that, or finish thinking it through, until next week.
Actually, I am going to merge this now as-is, and we can iteratively improve it later.
That way you don't have to have this open unresolved thing waiting around for me.
Plus for something this small, I would be needlessly spending your time and effort, and multiplying our total time and effort, by telling you what changes I want instead of just making them myself.
I think I was initially keeping this unmerged just in case I decided on something you disagreed with, so that you had a better opportunity to tell me "wait no that's a bad change because [...]" before I overwrote the lines you contributed, but now that I think about it that's not really a good enough reason to not just do it the nimble merge-then-iteratively-improve-as-needed way.
Quick question: do you have any thoughts on different classes, or classes with different names, hashing differently vs the same?
I was thinking that the right default is for subclass instances to compare equal to their base class instances by default (they don't right now, despite the __eq__
docstring saying that they do, due to a bug that I haven't gotten around to committing a fix for) and this means that subclasses should probably hash the same way as their parent class.
Couple examples of this:
The MAC
and EUI48
classes. I think the most sensible behavior is that they're effectively aliases, so much so that maybe I should have just done MAC = EUI48
instead of class MAC(EUI48): ...
. However, maybe someone sees a good use-case for them to count as two different members of a set?
MAC
and the MACAllowsTrailingDelimiters
example - this seems like a totally clear-cut case of a situation where the difference of type or even just type name shouldn't cause objects to compare or hash differently. Of course, it was kinda bad on my part to suggest changing formats by subclassing. If I provide and encourage a better way, then this becomes irrelevant.
But the current idea of class-aware equality comparison creates a weird hashing and set membership situation.
Sets and dictionaries and so on consider things the same if they are equal, but they expect and rely on equal things to hash the same, right?
So for example, if we have two subclasses of MAC
, then they both compare equal to MAC
instances with the same address (once the bug I described is fixed) but they compare unequal to each other. Even if we get hashing to be the same, this means we can get different sets purely based on the order of adds to those sets! That's messed up.
For another example, if two HWAddress types compare equal, then you have to know how they hash, and if they hash differently then I think it becomes indeterministic if they get counted as one member or two in the set (deterministic only if we know the underlying implementation details and all operations done, like how many buckets are in the underlying hash table at any given moment and how the hash method's hash is mapped to each bucket).
My current thoughts are:
self
to the class that we want to be equivalent to forward to that equality method. This is consistent with a hash by type and address value. The only reasons right now for classes to do subclassing like this are the MAC/EUI-48 situation (honestly I should probably just alias them), the formatting situation (I should probably expose the relevant logic so people could do it without subclassing - how often is there a good reason for this to actually come up anyway? never?)Of course I should also keep my eyes on the common cases! Those should guide a lot of these choices. The uncommon cases should just be reasonably doable.
The common case has no subclassing beyond what this library does.
The commonest appropriate use case for subclassing a HWAddress
subclass would presumably be to add unrelated enriching behavior, and probably should not effect equality or hashing.
Treating MAC
and EUI48
as equivalent in all ways seems like it is the right behavior more often than not. (In particular, a reasonable situation is that a user writes code like if is instance(something, Mac address.MAC)
, and this should not break if something
was initialized by someone who used macaddress.EUI48
.) So I think I've decided that MAC
and EUI48
should be aliases of each other. This is a minor breaking change that would probably effect no one. The one downside of this is that error messages use the class name, so you could get error messages that refer to the other class name. Maybe this means that MAC
just shouldn't exist as a name exported by this library, but that's a much bigger breaking change.
Alright that concludes this chunk of the thought dump.
Actually, I am going to merge this now as-is, and we can iteratively improve it later.
That way you don't have to have this open unresolved thing waiting around for me.
Plus for something this small, I would be needlessly spending your time and effort, and multiplying our total time and effort, by telling you what changes I want instead of just making them myself.
I think I was initially keeping this unmerged just in case I decided on something you disagreed with, so that you had a better opportunity to tell me "wait no that's a bad change because [...]" before I overwrote the lines you contributed, but now that I think about it that's not really a good enough reason to not just do it the nimble merge-then-iteratively-improve-as-needed way.
First and foremost, I would like to thank you for your consideration towards my little contribution. Regarding of whether or not to "merge-then-iteratively-improve-as-needed", I definitely should have opened an issue first so that we can get some general ideas before (either of us) jumping into coding. Well, what's done is done, and I wholeheartedly appreciate your concerns.
Before going any further, I would like to move our discussion to the issue #3 that I just opened. I believe that it is worth doing so because what you want to achieve (mentioned in your thought dump) may span across multiple pull requests (or commits), which is definitely out of scope of this pull request.
Anyway, I would like to conclude this pull request here and continue the discussion in a more appropriate place -> here #3.
Jotting down one design thought that I never got around to writing anywhere:
The reason why I initially didn't do this was because users could trivially do this:
my_set.add(bytes(mac_address))
# or
my_set.add((type(mac_address), bytes(mac_address))
# of
my_set.add((mac_address.size, bytes(mac_address))
# or
...
based on their needs.
When you have a concrete use-case for putting a MAC address in a set or as a dictionary key, you know exactly what information you do or don't need to determine uniqueness/sameness of an address.
When you're providing a __hash__
function, there's no single answer to what information a user will or won't need, so you're trying to strike a balance that's good for most cases and isn't overly bad for any cases.
I only implemented
__hash__
method to makeHWAddress
and its subclasses hashable. This comes from my need of addingMAC
to aset
.