mentalisttraceur / python-macaddress

BSD Zero Clause License
19 stars 5 forks source link

Class vs ABC vs metaclass vs decorator vs `__init_subclass__` #4

Closed mentalisttraceur closed 2 years ago

mentalisttraceur commented 2 years ago

Should the HWAddress base class be changed to an abstract base class, a metaclass, or a decorator? (Edit: or get an __init_subclass__ method?)

duynhaaa commented 2 years ago

I'm leaning more toward the metaclass option because:

  1. Similar to the ipaddress, I don't expect users to create any subclass from any of the concrete classes provided by the library. Even if they do, the possibility of replacing metaclass is very unlikely since the usage of metaclass is already rare.

  2. Correct me if I'm wrong, but using metaclass would allow you to reuse the comparison implementation between concrete classes without worrying about entangling concrete classes that are meant to be differentiable. For example, if there is a concrete class called CHEESE and it has identical structure to MAC, but CHEESE and MAC are not meant to be compared at all. In this case, it would require using special attributes of each of them, which may or may not exist in other classes. That would be a hassle.

    But again, this is only me speaking from a generic and designing point of view of a software developer who does not have a knack for networking. Maybe IEEE won't ever release anything else in another 30 years and it does not worth the effort to consider such extensibility and openess.

  3. I just think that hardware address is a very broad term and the concrete classes should not be able to interchangeable between one and another. A very good example IPv4Address and IPv6Address. They are both IP addresses, but they certainly can not be interchangeable when doing some abstract programming (they both share the same parent _BaseAddress, but at least it's private and users are not encourage to use it for type hinting). You have to know specifically that you're dealing with IPv4 or IPv6 or both of them. It's tedious to manually handle them separately (not always, but often enough), but your code has clarity.

I don't like decorator because it usually messes with IDE's "intellisense" (or I'm just dumb and misused it).

Abstract base class is totally fine, but I just prefer to have my object's "family tree" flat :)

mentalisttraceur commented 2 years ago

Jotting down some notes on the functional differences:

  1. Metaclass or decorator allows us to run logic at hardware address class definition time, and so we can help users catch subclassing errors much sooner.

  2. Decorator vs metaclass locality: a decorator only effects the thing that you decorate; a metaclass effects every subclass. So a metaclass can have "non-local" action: you can write a subclass which uses the metaclass of the parent without explicitly mentioning (or knowing of) the metaclass anywhere in that code.

  3. Metaclass breaks multiple inheritance, except not even universally but as an invisible surprising landmine: two different base classes with two different metaclasses is not allowed by Python, as I understand it. Which means you can add a metaclass and suddenly stuff breaks in some multiply-inheriting subclass. (Can think of two workarounds which I haven't tested, but those both start by creating a new metaclass which inherits from both of the metaclasses.)

  4. Metaclass ergonomics: class Bar(metaclass=Foo) vs class Bar(Foo). Relevant example is abc.ABC, which is a base class which has a metaclass which does all the actual adding of abstract base class logic.

mentalisttraceur commented 2 years ago

So one good question to mix in: what kind of logic can we usefully do at class definition time, and for each of those, is it more useful to have that logic re-run or not re-run at subclass definition time?

mentalisttraceur commented 2 years ago

For a while I thought of maybe adding size_in_bytes and size_in_nibbles (and maybe a more explicit size_in_bits alias for size).

And then I thought... why not let people define a class with their choice of size unit? size_in_bytes = 12 can be a lot more mentally ergonomic than size_in_bits = 96, especially when reading the code.

If I wanted to allow users the choice of defining hardware address classes in different size units, then a metaclass or decorator could do that, while a normal or abstract base class couldn't.

But!

Nowadays I think all of this is better solved with either:

  1. a separate library that provides numerical types with units, so people could do size = Bytes(8) where Bytes is defined somewhere as a unit with a factor of 8 relative to the base unit of a bit, or

  2. a couple conversion functions: size = bits_from_bytes(8) and bytes_from_bits(mac.size), or

  3. some mix of those two.

Because that's obviously the more composable and decoupled solution.

So I don't think this exact example possibility needs to influence anything, just part of my my whole design thought process.

mentalisttraceur commented 2 years ago

Notably, abc.ABC is (indirectly) a metaclass, and so the metaclass subclassing could bite people if they want to express something like:

I have a couple different address types, but they all have a certain common behavior. So I'll put the common behavior into a subclass of HWAddress, and I'll indicate that it's an ABC by multiply inheriting from abc.ABC, and... oh metaclass conflict. :(

This ties into the metaclass ergonomics thing from a couple comments ago, because what happens if class Foo(BarWhichHasMetaclass, metaclass=Qux).

mentalisttraceur commented 2 years ago

Noting that decorator and meta class don't offer any optimization or functional benefit besides the fail fast error detection thing, as far as I can tell.

I say this because my mind sometimes reaches for decorators or meta classes as the way to compute some information once from the information that the user specified when defining the class, and then have that information ready to go whenever it is desired. But all that can be computed on use, and cached on the instance or the class after first use if needed.

mentalisttraceur commented 2 years ago

@ahnyud Thank you for your comments on this by the way! Sorry it took me like three months to get back around to commenting on this. I did read your comments and appreciated them - they're definitely being factored into this whole design thought in my mind, and I do benefit from hearing them.

mentalisttraceur commented 2 years ago

The metaclass vs ABC problems are largely solved by having the metaclass inherit from abc.ABCMeta.

Then multiply inheriting from abc.ABC and HWAddress works, I think, but even if it doesn't the solution is to just inherit from HWAddress, and it would be pretty clearly documented (or at least obvious in an IDE or when looking at the source ) that HWAddress is already an ABC, so you can just inherit from it if you're making an abstract base (sub)class of hardware addresses.

mentalisttraceur commented 2 years ago

@ahnyud I've concluded I agree with you against the decorator approach.

Now that I've had some time to thoroughly digest the problem space, I'm not really seeing (or not remembering?) any good-enough benefit to doing a decorator, which isn't better met with a metaclass or ABC.

The one advantage was locality of effect, but it's definitely weird to turn classes into hardware address types with a decorator. (Then again there's an analog in dataclass.) More importantly than weird, it's just... not utilizing the type system optimally. The type system is there for us to use, and one great way to use it is to actually express what (type of thing) something is.

So if the decorator isn't programmatically making the decorated type a subclass of some sort of common type, then all the decorated types are potentially in their own separate taxonomies, and that starts to really mess with my model of how these types interact (#3, #8, and #7 all conceptually converge for me on code which expects stuff to share a HWAddress base class).

And that's the only way I'd have it be a decorator: doing checking and mutating passthrough - seeing if the necessary attributes are there and adding methods, but then returning the class object it already got. I wouldn't really want it to always generate some sort of wrapper subclass. Maybe there's a way to take an existing class and mutate it to have a subclass, but even if there was, I think it's actually weird and misleading if a decorator adds something to your inheritance hierarchy - if I wasn't very familiar with a library being used in some code and it did that, I would be annoyed, because it would initially mislead me by causing a different first impression of what it might be doing.

mentalisttraceur commented 2 years ago

I think it's also a good point that most users won't be subclassing concrete classes, just as ipaddress users won't be.

(The one caveat is that subclassing is currently the only way to parse an address with custom formats. And there's at least one piece of code I've seen on GitHub that actually uses this capability. But that use-case would still work fine even if I used a metaclass or whatever.)

mentalisttraceur commented 2 years ago

One of the biggest arguments against definition-time checks is that size should be optional for intermediate abstract classes which just want to define some behavior, and formats should be optional for quick-and-dirty stuff.

One of the biggest arguments for definition-time checks is making sure that if size and formats are both specified, they should agree with each other: I would love it if a hardware address class with size = 8 and formats = ('x',) raised an error about how the size and format don't match in the number of bits they represent.

mentalisttraceur commented 2 years ago

Continuing from the comment in #18 where I just mentioned this:

The biggest good I see for doing a metaclass is enforcing that the number of x characters in each entry of formats matches the number of nibbles as implied by size.

So now here's a question? Is there ever a good reason for a subclass to have non-matching format strings? No.

In fact it is actually really bad because this is an error that disappears silently right now. Imagine I messed up and implemented EUI48 like this:

class EUI48(_StartsWithOUI):
    """48-Bit Extended Unique Identifier (EUI-48).
    EUI-48 is also the modern official name for what
    many people are used to calling a "MAC address".
    """

    __slots__ = ()

    size = 48

    formats = (
        'xx-xx-xx-xx-xx-xx',
        'xx:xx:xx:xx:xx:xx',
        'xxxx.xxxx.xxxx',
        'xxxxxxxxxxx',
    )

Do you see the error?

I probably would never spot it:

>>> EUI48('01-23-45-67-89-ab')
EUI48('01-23-45-67-89-AB')
>>> EUI48('01:23:45:67:89:ab')
EUI48('01-23-45-67-89-AB')
>>> EUI48('0123.4567.89ab')
EUI48('01-23-45-67-89-AB')
>>> EUI48('0123456789ab')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/python3/lib/python3.10/site-packages/macaddress.py", line 100, in __init__
    self._address, _ = _parse(address, type(self))
  File "/home/user/python3/lib/python3.10/site-packages/macaddress.py", line 399, in _parse
    raise _value_error(string, 'cannot be parsed as', *classes)
ValueError: '0123456789ab' cannot be parsed as EUI48

Surprise!

>>> EUI48('0123456789a')
EUI48('00-12-34-56-78-9A')

Bad string parses, good string doesn't parse, and you don't know until it either suddenly and non-obviously errors out on good input or worse, silently wrongly parses wrong input.

mentalisttraceur commented 2 years ago

I don't see any good reason to ever have formats that don't match the nibble size (the closest to a good reason which I can see looks more like a situation where it would be better if this library supported bitwise operators and/or prefix-cast-construction).

So I'm tempted to say that having a metaclass automatically check this is a great idea.

mentalisttraceur commented 2 years ago

One abstract base class use-case that's been rattling around in my mind is to programmatically "document"/enforce clearly that a concrete hardware address class needs to provide a size.

Of course I can do this anyway in the __init__ provided by HWAddress, by raising a TypeError with a message like "hardware address class without size cannot be initialized" (although that's a little longer than I'd like it to be).

But ultimately the whole point of an ABC is precisely to express stuff like that. I could do it myself, but the Python ecosystem has standard language for it, and that language is an ABC.

mentalisttraceur commented 2 years ago

Although the default error text is somewhat annoying:

TypeError: Can't instantiate abstract class Foo with abstract method size

It's not an abstract method, you know? It's an abstract property.

And if I'm going to be wrangling with abc.ABC to get it to use a slightly better error message, I almost might as well just skip using abc.ABC entirely.

mentalisttraceur commented 2 years ago

Then again caring so much about the error text is probably the wrong move. In principle, it's probably better to delegate/abstract that to the abstract property implementation - trust that one day, Python will use a more appropriate adjusted error message in abstract property cases.

mentalisttraceur commented 2 years ago

So if I was doing an abstract base class, I'd also do a metaclass, and I think my ideal looks something like this:

from abc import ABCMeta as _ABCMeta, abstractproperty as _abstractproperty

class HWAddressMeta(_ABCMeta):
    def __new__(metaclass, name, bases, namespace):
        dummy_class = type.__new__(metaclass, name, bases, namespace)
        if isinstance(dummy_class.size, _abstractproperty):
            formats = dummy_class.formats
            if len(formats):
                nibbles = sum(c == 'x' for c in formats[0])
                namespace['size'] = nibbles * 4
        cls = super().__new__(metaclass, name, bases, namespace)
        if isinstance(cls.size, _abstractproperty):
            return cls
        nibbles_in_address = (cls.size + 3) >> 2
        for format_ in cls.formats:
            nibbles_in_format = sum(c == 'x' for c in format_)
            if nibbles_in_format != nibbles_in_address:
                raise ValueError('formats size mismatch')
        return cls

and then update HWAddress like so:

class HWAddress(metaclass=HWAddressMeta):
     ...
    @_abstractproperty
    @classmethod
    def size(cls):
        """The size, in bits, of the hardware address class."""
        return None
    ...

What a mess! (Particularly the first half of __new__ in the metaclass.)

mentalisttraceur commented 2 years ago

(I also haven't decided how I feel about abstractproperty vs the composition of property and abstractmethod - I feel like the first one has a more appropriate name, and makes way more sense in an isinstance check. Which, while we're at it, is a great example of why it's sometimes very useful to use non-flat class hierachies to communicate distinctions which might be useful in some cases.)

mentalisttraceur commented 2 years ago

So out of all that dense and convoluted and hacky metaclass business, what do we actually get?

  1. If any of the formats strings has inconsistent size, error.
  2. If there is no size but there is formats, infer size from formats (if there are formats of different size, that will be caught by the previous point).
  3. HWAddress subclasses with neither size nor formats refuse to instantiate.
  4. The code "natively" self-describes that HWAddress is an ABC, not a concrete class, and so is any subclass that doesn't define size. (Although if I derive size from formats, it's probably not obvious to stuff like static analysis tools that a class which defines formats and not size actually implements `size. So maybe I should strip that feature.)

Worth it? I don't know.

mentalisttraceur commented 2 years ago

Notably, as of Python 3.6, we can use __init_subclass__ instead of metaclasses, to achieve all of the metaclass stuff.

With macaddress, I'm pretty fine with having features like these as progressive-enhancement non-EOL'ed Pythons, so I'm actually pretty okay with using __init_subclass__.

mentalisttraceur commented 2 years ago

If we remove the "derive size from formats" convenience feature from the metaclass, it simplifies a lot:

class HWAddressMeta(_ABCMeta):
    def __new__(metaclass, name, bases, namespace):
        cls = super().__new__(metaclass, name, bases, namespace)
        if isinstance(cls.size, _abstractproperty):
            return cls
        nibbles_in_address = (cls.size + 3) >> 2
        for format_ in cls.formats:
            nibbles_in_format = sum(c == 'x' for c in format_)
            if nibbles_in_format != nibbles_in_address:
                raise ValueError('formats size mismatch')
        return cls
mentalisttraceur commented 2 years ago

One gimmicky thing you can do if I keep allowing formats and size to mismatch:

class EUI48LowHalf(EUI48):
    formats = ('xx-xx-xx', 'xx:xx:xx', 'xxxxxx')

suffix = EUI48LowHalf('78-90-AB')

When would this ever be useful? Approximately never I think.

But imagine I also add bitwise operators, which is something I have been thinking about doing for at least half a year:

prefix = OUI('12-34-56')

eui48 = prefix | suffix
eui48 == EUI48('12-34-56-78-90-AB')

Boom, absolutely useless for the vast majority of developers using this library, but it's a neat emergent property, and I can imagine this being useful in a program managing MAC address assignments within a company or whatever.

Notably this scales for incremental parsing:

class IAB(HWAddress):
    size = 36
class IABOwner(IAB):
    formats = ('-xx-x',)
class DeviceID(EUI48):
    formats = ('x-xx',)

string = '01-23-45-67-89-AB'

oui = OUI(string[:8])
iab_owner = IABOwner(string[8:13])
device_id = DeviceID(string[13:])

eui48 = oui | iab_owner | device_id

But of course, first of all, why? When is this a thing that anyone would want to do?

Especially when we can just do this, and avoid two spurious weird classes which only exist to play weird formatting tricks:

string = '01-23-45-67-89-AB'
eui48 = EUI48(string)
iab = IAB(eui48)
oui = OUI(eui48)
iab_owner = iab ^ oui
device_id = eui48 ^ iab

(This is all assuming #8 is implemented, and bitwise operator overloads are implemented and cast their operands to the wider of the two classes as if the smaller one is a prefix of the larger one, mirroring how #8 works.)

So I think the only time this feature would be useful is if someone was trying to implement incremental parsing, as discussed in #19. But I think I should almost certainly do some kind of YAGNI dismissal with that possibility until I actually see the need.

mentalisttraceur commented 2 years ago

One other annoying thing with the metaclass approach is that if I don't keep a base class too (whether concretely explicitly abstract by inheriting from abc.ABC, or just abstractly implicitly abstract by being unusable as-is), I'm going to have to make that metaclass add all the methods that HWAddress currently provides. The boilerplate ends up being worse I think, and in any case it would be less obvious/intuitive/familiar for readers of the source what's going on.

mentalisttraceur commented 2 years ago

In other news, looks like Python 3.11 has decided to break classmethod and property interaction.

This was already an only-recently-working sort-of-thing (in 3.9, classmethod finally learned how to play nice and transparently with the descriptor protocol - prior to that I don't even know what you could do, if anything), and now it's going away again.

Actually I had forgotten that classmethod-ness, abstract-ness, and property-ness only became composable so recently.

But in any case, I'm starting to feel that CPython is just not a stable language on the edges like it used to be. This isn't the first thing they've jiggled back and forth, in a way that just punishes early adoption trust. (The other one I can think of off the top of my head is type annotations, which changed in breaking ways in 3.10, and then there came lots of discussion of changing it yet again.) This is a super concerning feeling to have.

mentalisttraceur commented 2 years ago

Okay, so I think anything involving class properties is out.

I could write my own custom read-only class property descriptor implementation - that isn't too hard:

class _classproperty:
    __slots__ = ('_method',)
    def __init__(self, method):
        self._method = method
    def __get__(self, obj, objtype=None):
        return self._method(objtype)

But then there's probably other downsides I'm not seeing yet, like static typing tools getting confused and all that.

mentalisttraceur commented 2 years ago

This rules out something I was thinking about, which basically looked like making HWAddress.size an actually useful class property.

But that's probably for the best, because the only version of that which I'm happy with also included still using a metaclass like above to enforce some consistency checks on formats, and if I did use a class property for HWAddress.size then there would be non-local interplay to ensure correctness between that and HWAddressMeta.__new__.

mentalisttraceur commented 2 years ago

My verdict so far is: either it stays a plain class as-is, or I add __init_subclass__.

My emotional temptation is to just bury this whole idea for another year. Leave it as-is, with HWAddress as a normal class, and move on with my life.

This is currently holding me up pretty severely.

mentalisttraceur commented 2 years ago

Here's the final little spurt of design exploration I'm willing to give this:

The most minimal version that achieves just the one biggest improvement - if formats has a format that doesn't match size, error out:

class HWAddress:
    def __init_subclass__(cls):
        errors = []
        for format_ in cls.formats:
            nibbles_in_address = (cls.size + 3) >> 2
            nibbles_in_format = sum(c == 'x' for c in format_)
            if nibbles_in_format < nibbles_in_address:
                errors.append(repr(format_) + ' is too short')
            elif nibbles_in_format > nibbles_in_address:
                errors.append(repr(format_) + ' is too long')
        if errors:
            name = cls.__name__
            details = '; '.join(errors)
            raise ValueError('bad formats for ' + name + ': ' + details)

Tentative verdict: that's going in.

mentalisttraceur commented 2 years ago

Notably, this logic in __str__ is coupled with .formats[0] having the same number of x characters as the address has nibbles:

...
offset = (4 - type(self).size) & 3
unconsumed_address_value = self._address << offset
for character in reversed(formats[0]):
    ...
...

If .formats[0] has fewer nibbles than the address, this would truncate the value; if has more nibbles, this would add trailing zeroes.

Seems like the sort of thing that's more likely to mask an error by silently doing the wrong thing than to be what the user wanted.

So having a "do formats match size?" check is valuable in that way too.

mentalisttraceur commented 2 years ago

Reaffirming I'm fairly confidently settled into:

  1. use __init_subclass__ for nice-to-have checks for developer errors (defining subclasses badly),
  2. keep HWAddress as an implicitly abstract base class (it has always been semantically abstract, and it can't be normally instantiated because __init__ always raises an error for it, nor used much even if someone bypasses that, because it doesn't define a size), and
  3. give up on metaclasses and the abc machinery.
mentalisttraceur commented 2 years ago

With that settled, there's two things I currently have in mind for __init_subclass__:

  1. verify that the number of "x" characters in formats matches the number of nibbles that are needed for size bits (example implementation in earlier comment), and
  2. verify that once a HWAddress subclass defines with a size, subclasses of that subclass can't define a different size (if size is set on the sub-subclass, should be ==, possibly after doing an int(...) on both, to the size on the concrete base subclass).

I remember having three things in mind but I can't remember or reconstruct the third.

mentalisttraceur commented 2 years ago

Actually, I feel enough design pressure to overcome the "oh no __init_subclass__ doesn't work on [ancient or obscure and incomplete Python]" hangup.

__init_subclass__ was added in 3.6, which came out in 2016! so long ago that already 3.7 is the oldest supported CPython. And I never supported Python 2 in this library.

I need to really ask myself if my portability obsession is a good reason to have a worse API, because __init_subclass__ is basically ideal otherwise.

It's not unreasonable to ask the handful of people who might use this library on Pythons with non-working __init_subclass__ support to deal with that problem. I can even help them deal with that problem by giving them a decorator, so that if they want that compatibility, they literally just need to slap the decorator on top of their subclasses. The README section for defining a subclass can just say "if you need to be portable to Pythons where __init_subclass__ does not work, decorate the subclass with @macaddress.hwaddress_init_subclass". In the rare case that portability matters that much, this is enough.

mentalisttraceur commented 2 years ago

Well, it would suck for transitive users module foo defined some HWAddress subclass for some hardware without caring about the portability, module bar uses foo on MicroPython.

But it's still fair to say that that's foo's fault.