Closed mentalisttraceur closed 2 years ago
I'm leaning more toward the metaclass option because:
Similar to the ipaddress
, I don't expect users to create any subclass from any of the concrete classes provided by the library.
Even if they do, the possibility of replacing metaclass is very unlikely since the usage of metaclass is already rare.
Correct me if I'm wrong, but using metaclass would allow you to reuse the comparison implementation between concrete classes without worrying about entangling concrete classes that are meant to be differentiable. For example, if there is a concrete class called CHEESE
and it has identical structure to MAC
, but CHEESE
and MAC
are not meant to be compared at all. In this case, it would require using special attributes of each of them, which may or may not exist in other classes. That would be a hassle.
But again, this is only me speaking from a generic and designing point of view of a software developer who does not have a knack for networking. Maybe IEEE won't ever release anything else in another 30 years and it does not worth the effort to consider such extensibility and openess.
I just think that hardware address is a very broad term and the concrete classes should not be able to interchangeable between one and another. A very good example IPv4Address
and IPv6Address
. They are both IP addresses, but they certainly can not be interchangeable when doing some abstract programming (they both share the same parent _BaseAddress
, but at least it's private and users are not encourage to use it for type hinting). You have to know specifically that you're dealing with IPv4 or IPv6 or both of them. It's tedious to manually handle them separately (not always, but often enough), but your code has clarity.
I don't like decorator because it usually messes with IDE's "intellisense" (or I'm just dumb and misused it).
Abstract base class is totally fine, but I just prefer to have my object's "family tree" flat :)
Jotting down some notes on the functional differences:
Metaclass or decorator allows us to run logic at hardware address class definition time, and so we can help users catch subclassing errors much sooner.
Decorator vs metaclass locality: a decorator only effects the thing that you decorate; a metaclass effects every subclass. So a metaclass can have "non-local" action: you can write a subclass which uses the metaclass of the parent without explicitly mentioning (or knowing of) the metaclass anywhere in that code.
Metaclass breaks multiple inheritance, except not even universally but as an invisible surprising landmine: two different base classes with two different metaclasses is not allowed by Python, as I understand it. Which means you can add a metaclass and suddenly stuff breaks in some multiply-inheriting subclass. (Can think of two workarounds which I haven't tested, but those both start by creating a new metaclass which inherits from both of the metaclasses.)
Metaclass ergonomics: class Bar(metaclass=Foo)
vs class Bar(Foo)
. Relevant example is abc.ABC
, which is a base class which has a metaclass which does all the actual adding of abstract base class logic.
So one good question to mix in: what kind of logic can we usefully do at class definition time, and for each of those, is it more useful to have that logic re-run or not re-run at subclass definition time?
For a while I thought of maybe adding size_in_bytes
and size_in_nibbles
(and maybe a more explicit size_in_bits
alias for size
).
And then I thought... why not let people define a class with their choice of size unit? size_in_bytes = 12
can be a lot more mentally ergonomic than size_in_bits = 96
, especially when reading the code.
If I wanted to allow users the choice of defining hardware address classes in different size units, then a metaclass or decorator could do that, while a normal or abstract base class couldn't.
But!
Nowadays I think all of this is better solved with either:
a separate library that provides numerical types with units, so people could do size = Bytes(8)
where Bytes
is defined somewhere as a unit with a factor of 8 relative to the base unit of a bit, or
a couple conversion functions: size = bits_from_bytes(8)
and bytes_from_bits(mac.size)
, or
some mix of those two.
Because that's obviously the more composable and decoupled solution.
So I don't think this exact example possibility needs to influence anything, just part of my my whole design thought process.
Notably, abc.ABC
is (indirectly) a metaclass, and so the metaclass subclassing could bite people if they want to express something like:
I have a couple different address types, but they all have a certain common behavior. So I'll put the common behavior into a subclass of HWAddress
, and I'll indicate that it's an ABC by multiply inheriting from abc.ABC
, and... oh metaclass conflict. :(
This ties into the metaclass ergonomics thing from a couple comments ago, because what happens if class Foo(BarWhichHasMetaclass, metaclass=Qux)
.
Noting that decorator and meta class don't offer any optimization or functional benefit besides the fail fast error detection thing, as far as I can tell.
I say this because my mind sometimes reaches for decorators or meta classes as the way to compute some information once from the information that the user specified when defining the class, and then have that information ready to go whenever it is desired. But all that can be computed on use, and cached on the instance or the class after first use if needed.
@ahnyud Thank you for your comments on this by the way! Sorry it took me like three months to get back around to commenting on this. I did read your comments and appreciated them - they're definitely being factored into this whole design thought in my mind, and I do benefit from hearing them.
The metaclass vs ABC problems are largely solved by having the metaclass inherit from abc.ABCMeta
.
Then multiply inheriting from abc.ABC
and HWAddress
works, I think, but even if it doesn't the solution is to just inherit from HWAddress
, and it would be pretty clearly documented (or at least obvious in an IDE or when looking at the source ) that HWAddress
is already an ABC, so you can just inherit from it if you're making an abstract base (sub)class of hardware addresses.
@ahnyud I've concluded I agree with you against the decorator approach.
Now that I've had some time to thoroughly digest the problem space, I'm not really seeing (or not remembering?) any good-enough benefit to doing a decorator, which isn't better met with a metaclass or ABC.
The one advantage was locality of effect, but it's definitely weird to turn classes into hardware address types with a decorator. (Then again there's an analog in dataclass
.) More importantly than weird, it's just... not utilizing the type system optimally. The type system is there for us to use, and one great way to use it is to actually express what (type of thing) something is.
So if the decorator isn't programmatically making the decorated type a subclass of some sort of common type, then all the decorated types are potentially in their own separate taxonomies, and that starts to really mess with my model of how these types interact (#3, #8, and #7 all conceptually converge for me on code which expects stuff to share a HWAddress
base class).
And that's the only way I'd have it be a decorator: doing checking and mutating passthrough - seeing if the necessary attributes are there and adding methods, but then returning the class object it already got. I wouldn't really want it to always generate some sort of wrapper subclass. Maybe there's a way to take an existing class and mutate it to have a subclass, but even if there was, I think it's actually weird and misleading if a decorator adds something to your inheritance hierarchy - if I wasn't very familiar with a library being used in some code and it did that, I would be annoyed, because it would initially mislead me by causing a different first impression of what it might be doing.
I think it's also a good point that most users won't be subclassing concrete classes, just as ipaddress
users won't be.
(The one caveat is that subclassing is currently the only way to parse an address with custom formats. And there's at least one piece of code I've seen on GitHub that actually uses this capability. But that use-case would still work fine even if I used a metaclass or whatever.)
One of the biggest arguments against definition-time checks is that size should be optional for intermediate abstract classes which just want to define some behavior, and formats should be optional for quick-and-dirty stuff.
One of the biggest arguments for definition-time checks is making sure that if size and formats are both specified, they should agree with each other: I would love it if a hardware address class with size = 8
and formats = ('x',)
raised an error about how the size and format don't match in the number of bits they represent.
Continuing from the comment in #18 where I just mentioned this:
The biggest good I see for doing a metaclass is enforcing that the number of x
characters in each entry of formats
matches the number of nibbles as implied by size
.
So now here's a question? Is there ever a good reason for a subclass to have non-matching format strings? No.
In fact it is actually really bad because this is an error that disappears silently right now. Imagine I messed up and implemented EUI48
like this:
class EUI48(_StartsWithOUI):
"""48-Bit Extended Unique Identifier (EUI-48).
EUI-48 is also the modern official name for what
many people are used to calling a "MAC address".
"""
__slots__ = ()
size = 48
formats = (
'xx-xx-xx-xx-xx-xx',
'xx:xx:xx:xx:xx:xx',
'xxxx.xxxx.xxxx',
'xxxxxxxxxxx',
)
Do you see the error?
I probably would never spot it:
>>> EUI48('01-23-45-67-89-ab')
EUI48('01-23-45-67-89-AB')
>>> EUI48('01:23:45:67:89:ab')
EUI48('01-23-45-67-89-AB')
>>> EUI48('0123.4567.89ab')
EUI48('01-23-45-67-89-AB')
>>> EUI48('0123456789ab')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/python3/lib/python3.10/site-packages/macaddress.py", line 100, in __init__
self._address, _ = _parse(address, type(self))
File "/home/user/python3/lib/python3.10/site-packages/macaddress.py", line 399, in _parse
raise _value_error(string, 'cannot be parsed as', *classes)
ValueError: '0123456789ab' cannot be parsed as EUI48
Surprise!
>>> EUI48('0123456789a')
EUI48('00-12-34-56-78-9A')
Bad string parses, good string doesn't parse, and you don't know until it either suddenly and non-obviously errors out on good input or worse, silently wrongly parses wrong input.
I don't see any good reason to ever have formats that don't match the nibble size (the closest to a good reason which I can see looks more like a situation where it would be better if this library supported bitwise operators and/or prefix-cast-construction).
So I'm tempted to say that having a metaclass automatically check this is a great idea.
One abstract base class use-case that's been rattling around in my mind is to programmatically "document"/enforce clearly that a concrete hardware address class needs to provide a size
.
Of course I can do this anyway in the __init__
provided by HWAddress
, by raising a TypeError
with a message like "hardware address class without size cannot be initialized" (although that's a little longer than I'd like it to be).
But ultimately the whole point of an ABC is precisely to express stuff like that. I could do it myself, but the Python ecosystem has standard language for it, and that language is an ABC.
Although the default error text is somewhat annoying:
TypeError: Can't instantiate abstract class Foo with abstract method size
It's not an abstract method, you know? It's an abstract property.
And if I'm going to be wrangling with abc.ABC
to get it to use a slightly better error message, I almost might as well just skip using abc.ABC
entirely.
Then again caring so much about the error text is probably the wrong move. In principle, it's probably better to delegate/abstract that to the abstract property implementation - trust that one day, Python will use a more appropriate adjusted error message in abstract property cases.
So if I was doing an abstract base class, I'd also do a metaclass, and I think my ideal looks something like this:
from abc import ABCMeta as _ABCMeta, abstractproperty as _abstractproperty
class HWAddressMeta(_ABCMeta):
def __new__(metaclass, name, bases, namespace):
dummy_class = type.__new__(metaclass, name, bases, namespace)
if isinstance(dummy_class.size, _abstractproperty):
formats = dummy_class.formats
if len(formats):
nibbles = sum(c == 'x' for c in formats[0])
namespace['size'] = nibbles * 4
cls = super().__new__(metaclass, name, bases, namespace)
if isinstance(cls.size, _abstractproperty):
return cls
nibbles_in_address = (cls.size + 3) >> 2
for format_ in cls.formats:
nibbles_in_format = sum(c == 'x' for c in format_)
if nibbles_in_format != nibbles_in_address:
raise ValueError('formats size mismatch')
return cls
and then update HWAddress
like so:
class HWAddress(metaclass=HWAddressMeta):
...
@_abstractproperty
@classmethod
def size(cls):
"""The size, in bits, of the hardware address class."""
return None
...
What a mess! (Particularly the first half of __new__
in the metaclass.)
(I also haven't decided how I feel about abstractproperty
vs the composition of property
and abstractmethod
- I feel like the first one has a more appropriate name, and makes way more sense in an isinstance
check. Which, while we're at it, is a great example of why it's sometimes very useful to use non-flat class hierachies to communicate distinctions which might be useful in some cases.)
So out of all that dense and convoluted and hacky metaclass business, what do we actually get?
formats
strings has inconsistent size, error.size
but there is formats
, infer size
from formats
(if there are formats of different size, that will be caught by the previous point).HWAddress
subclasses with neither size
nor formats
refuse to instantiate.HWAddress
is an ABC, not a concrete class, and so is any subclass that doesn't define size
. (Although if I derive size
from formats
, it's probably not obvious to stuff like static analysis tools that a class which defines formats
and not size
actually implements `size. So maybe I should strip that feature.)Worth it? I don't know.
Notably, as of Python 3.6, we can use __init_subclass__
instead of metaclasses, to achieve all of the metaclass stuff.
With macaddress
, I'm pretty fine with having features like these as progressive-enhancement non-EOL'ed Pythons, so I'm actually pretty okay with using __init_subclass__
.
If we remove the "derive size
from formats
" convenience feature from the metaclass, it simplifies a lot:
class HWAddressMeta(_ABCMeta):
def __new__(metaclass, name, bases, namespace):
cls = super().__new__(metaclass, name, bases, namespace)
if isinstance(cls.size, _abstractproperty):
return cls
nibbles_in_address = (cls.size + 3) >> 2
for format_ in cls.formats:
nibbles_in_format = sum(c == 'x' for c in format_)
if nibbles_in_format != nibbles_in_address:
raise ValueError('formats size mismatch')
return cls
One gimmicky thing you can do if I keep allowing formats and size to mismatch:
class EUI48LowHalf(EUI48):
formats = ('xx-xx-xx', 'xx:xx:xx', 'xxxxxx')
suffix = EUI48LowHalf('78-90-AB')
When would this ever be useful? Approximately never I think.
But imagine I also add bitwise operators, which is something I have been thinking about doing for at least half a year:
prefix = OUI('12-34-56')
eui48 = prefix | suffix
eui48 == EUI48('12-34-56-78-90-AB')
Boom, absolutely useless for the vast majority of developers using this library, but it's a neat emergent property, and I can imagine this being useful in a program managing MAC address assignments within a company or whatever.
Notably this scales for incremental parsing:
class IAB(HWAddress):
size = 36
class IABOwner(IAB):
formats = ('-xx-x',)
class DeviceID(EUI48):
formats = ('x-xx',)
string = '01-23-45-67-89-AB'
oui = OUI(string[:8])
iab_owner = IABOwner(string[8:13])
device_id = DeviceID(string[13:])
eui48 = oui | iab_owner | device_id
But of course, first of all, why? When is this a thing that anyone would want to do?
Especially when we can just do this, and avoid two spurious weird classes which only exist to play weird formatting tricks:
string = '01-23-45-67-89-AB'
eui48 = EUI48(string)
iab = IAB(eui48)
oui = OUI(eui48)
iab_owner = iab ^ oui
device_id = eui48 ^ iab
(This is all assuming #8 is implemented, and bitwise operator overloads are implemented and cast their operands to the wider of the two classes as if the smaller one is a prefix of the larger one, mirroring how #8 works.)
So I think the only time this feature would be useful is if someone was trying to implement incremental parsing, as discussed in #19. But I think I should almost certainly do some kind of YAGNI dismissal with that possibility until I actually see the need.
One other annoying thing with the metaclass approach is that if I don't keep a base class too (whether concretely explicitly abstract by inheriting from abc.ABC
, or just abstractly implicitly abstract by being unusable as-is), I'm going to have to make that metaclass add all the methods that HWAddress
currently provides. The boilerplate ends up being worse I think, and in any case it would be less obvious/intuitive/familiar for readers of the source what's going on.
In other news, looks like Python 3.11 has decided to break classmethod
and property
interaction.
This was already an only-recently-working sort-of-thing (in 3.9, classmethod
finally learned how to play nice and transparently with the descriptor protocol - prior to that I don't even know what you could do, if anything), and now it's going away again.
Actually I had forgotten that classmethod-ness, abstract-ness, and property-ness only became composable so recently.
But in any case, I'm starting to feel that CPython is just not a stable language on the edges like it used to be. This isn't the first thing they've jiggled back and forth, in a way that just punishes early adoption trust. (The other one I can think of off the top of my head is type annotations, which changed in breaking ways in 3.10, and then there came lots of discussion of changing it yet again.) This is a super concerning feeling to have.
Okay, so I think anything involving class properties is out.
I could write my own custom read-only class property descriptor implementation - that isn't too hard:
class _classproperty:
__slots__ = ('_method',)
def __init__(self, method):
self._method = method
def __get__(self, obj, objtype=None):
return self._method(objtype)
But then there's probably other downsides I'm not seeing yet, like static typing tools getting confused and all that.
This rules out something I was thinking about, which basically looked like making HWAddress.size
an actually useful class property.
But that's probably for the best, because the only version of that which I'm happy with also included still using a metaclass like above to enforce some consistency checks on formats
, and if I did use a class property for HWAddress.size
then there would be non-local interplay to ensure correctness between that and HWAddressMeta.__new__
.
My verdict so far is: either it stays a plain class as-is, or I add __init_subclass__
.
My emotional temptation is to just bury this whole idea for another year. Leave it as-is, with HWAddress
as a normal class, and move on with my life.
This is currently holding me up pretty severely.
Here's the final little spurt of design exploration I'm willing to give this:
The most minimal version that achieves just the one biggest improvement - if formats
has a format that doesn't match size
, error out:
class HWAddress:
def __init_subclass__(cls):
errors = []
for format_ in cls.formats:
nibbles_in_address = (cls.size + 3) >> 2
nibbles_in_format = sum(c == 'x' for c in format_)
if nibbles_in_format < nibbles_in_address:
errors.append(repr(format_) + ' is too short')
elif nibbles_in_format > nibbles_in_address:
errors.append(repr(format_) + ' is too long')
if errors:
name = cls.__name__
details = '; '.join(errors)
raise ValueError('bad formats for ' + name + ': ' + details)
Tentative verdict: that's going in.
Notably, this logic in __str__
is coupled with .formats[0]
having the same number of x
characters as the address has nibbles:
...
offset = (4 - type(self).size) & 3
unconsumed_address_value = self._address << offset
for character in reversed(formats[0]):
...
...
If .formats[0]
has fewer nibbles than the address, this would truncate the value; if has more nibbles, this would add trailing zeroes.
Seems like the sort of thing that's more likely to mask an error by silently doing the wrong thing than to be what the user wanted.
So having a "do formats match size?" check is valuable in that way too.
Reaffirming I'm fairly confidently settled into:
__init_subclass__
for nice-to-have checks for developer errors (defining subclasses badly),HWAddress
as an implicitly abstract base class (it has always been semantically abstract, and it can't be normally instantiated because __init__
always raises an error for it, nor used much even if someone bypasses that, because it doesn't define a size
), andabc
machinery.With that settled, there's two things I currently have in mind for __init_subclass__
:
formats
matches the number of nibbles that are needed for size
bits (example implementation in earlier comment), andHWAddress
subclass defines with a size
, subclasses of that subclass can't define a different size
(if size
is set on the sub-subclass, should be ==
, possibly after doing an int(...)
on both, to the size
on the concrete base subclass).I remember having three things in mind but I can't remember or reconstruct the third.
Actually, I feel enough design pressure to overcome the "oh no __init_subclass__
doesn't work on [ancient or obscure and incomplete Python]" hangup.
__init_subclass__
was added in 3.6, which came out in 2016! so long ago that already 3.7 is the oldest supported CPython. And I never supported Python 2 in this library.
I need to really ask myself if my portability obsession is a good reason to have a worse API, because __init_subclass__
is basically ideal otherwise.
It's not unreasonable to ask the handful of people who might use this library on Pythons with non-working __init_subclass__
support to deal with that problem. I can even help them deal with that problem by giving them a decorator, so that if they want that compatibility, they literally just need to slap the decorator on top of their subclasses. The README section for defining a subclass can just say "if you need to be portable to Pythons where __init_subclass__
does not work, decorate the subclass with @macaddress.hwaddress_init_subclass
". In the rare case that portability matters that much, this is enough.
Well, it would suck for transitive users module foo
defined some HWAddress subclass for some hardware without caring about the portability, module bar
uses foo
on MicroPython.
But it's still fair to say that that's foo
's fault.
Should the
HWAddress
base class be changed to an abstract base class, a metaclass, or a decorator? (Edit: or get an__init_subclass__
method?)