python-attrs / attrs

Python Classes Without Boilerplate
https://www.attrs.org/
MIT License
5.3k stars 373 forks source link

Order of attributes in a attr.s class is dependent upon the order the attributes were created #643

Open hansonap opened 4 years ago

hansonap commented 4 years ago

This is related to #300, #339, and #343, but this issue doesn't just occur with attr.make_class but also normal classes defined across modules. I am submitting this issue not necessarily to prompt change to how attributes are ordered, but to document the situation, discuss documentation change, and help others you have had this issue.

Take for example, module_a defines a collection of available attributes that can be queried across an interface. module_a also defines ClassA that contains the minimum required attributes for working across the interface. Three types of attributes exist: required, optional, and deferred, but deferred attributes are not defined in module_a

model_b (and dozens of other modules) define classes, such as ClassB that inherit from ClassA, and also include a varying number of attributes (depending on the use case). Some of these attributes are deferred attributes (default=None and metadata{'attr_type': AttributeType.Deferred}). Deferred attributes are ignored when retrieving data across the interface. When retrieving data across the interface, a function takes ClassB and builds a list of required attributes, and a dictionary of optional attributes (maps attribute init position to attribute name), and deferred attributes are ignored. The list and dictionary are sent across the interface, massive amount of data is fetched, and a list of ClassBs are initialized. Deferred attributes are either derived from required / optional attributes, or must be gathered through another interface in a second step and serve as a placeholder.

@attr.s
class ClassA:
    index = attr.ib(type=int, metadata={'required': True, 'attr_type': AttributeType.Required})
    tree_position = attr.ib(type=str, metadata={'required': True, 'attr_type': AttributeType.Required})
    ...

SourceAttr = attr.ib(type=ProductSource,
                     converter=convert_product_source,
                     metadata={'attr_type': AttributeType.Optional})

RevisionAttr = attr.ib(type=str, metadata={'attr_type': AttributeType.Optional})
DefinitionAttr = attr.ib(type=str, metadata={'attr_type': AttributeType.Optional})
# hundreds more

@attr.s
class ClassB(ClassA):
    revision = RevisionAttr
    source = SourceAttr
    ...
    # deferred attributes
    other_info = attr.ib(type=List[view_limits.OtherInfo], default=attr.Factory(list),
                             metadata={'attr_type': AttributeType.Deferred})

This, generally, works great. A class always inherit the required attributes, then the optional attributes are listed (source, revision) and then any number of deferred attributes are listed. Even if the order of the optional attributes in ClassB is different then how they were defined in module_a, this pattern works as a dictionary of attribute index to name is provided to the interface, so the data is always returned in the correct order.

Now, where this fails is when:

  1. I have a deferred attribute that is too complex to define in ClassB and is defined as its own class and / or I reuse it from another module AND
  2. I defined an optional attribute in ClassB

Take for example:

ComplexAttr = attr.ib(type=List[view_limits.OtherInfo], default=attr.Factory(list),
                         metadata={'attr_type': AttributeType.Deferred})

@attr.s
class ClassB(ClassA):
    revision = RevisionAttr
    source = SourceAttr
    document_name = attr.ib(type=str, metadata={'attr_type': AttributeType.Optional})

    # deferred attributes
    complex = ComplexAttr

The below dictionary is the resulting dictionary. Attribute slots 0 and 1 are reserved for the required attributes, and the optional indexes should be 2, 3 and 4 (not 5). A failure occurs across the interface when it internally attempts to set the 5th element, as it's list was allocated with a length of 5.

{2: 'revision', 3: 'source', 5: 'document_name'}

Three discussion points:

  1. If you are programmatically analyzing the attribute order of a class with attr.fields_dict (or similar) the order is not defined by the attribute assignment order in the class, but by the order the attributes were defined in. Knowing this and needing to programmatically inspect the order, set up a pattern and follow it.
  2. The mechanism of how the order is defined should be clearly documented. This would have been a huge benefit.
  3. Intuitively, if I would expect to initialize the above as it is written: ClassB(index, tree_position, revision, source, document_name, complex). What technically is keeping this from working this way? From other posts I've read and the documentation itself, I'm guessing there are some eyebrows raising over reuse of attributes and class inheritance - The examples above are simplified and the actually attributes can have complex definitions that may change over time, and defining them in each class they are used would be impossible to maintain. Similarly is the class inheritance - I have many complex usage of this data but the data is expensive to retrieve, so certain modules will define the minimum required attributes in a class. Depending on the scope of work being performed, several modules may be required to post process the data, so first the a class is constructed using ClassB and ClassC.
hynek commented 4 years ago

First, thank you for your elaborate summary and sorry for my late response. I was gonna respond but always wanted to do it “when I have time” to do it “properly”. Well, it went as expected.

I’m intruiged by your approach of sharable attributes and this wild meta programming!

Since you’re using type annotations anyway, there’s actually a workaround for your problem: attrs won’t use the counters in auto_attribs=True mode:

>>> from typing import Any
>>> import attr
>>> x = attr.ib()
>>> y = attr.ib()
>>> x
_CountingAttr(counter=20, _default=NOTHING, repr=True, eq=True, order=True, hash=None, init=True, metadata={})
>>> y
_CountingAttr(counter=21, _default=NOTHING, repr=True, eq=True, order=True, hash=None, init=True, metadata={})
>>> @attr.s(auto_attribs=True)
... class C:
...     y: Any = y
...     x: Any = x
>>> C.__attrs_attrs__
(Attribute(name='y', default=NOTHING, validator=None, repr=True, eq=True, order=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.Any, converter=None, kw_only=False, inherited=False),
 Attribute(name='x', default=NOTHING, validator=None, repr=True, eq=True, order=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.Any, converter=None, kw_only=False, inherited=False))
hansonap commented 4 years ago

Great! This looks like exactly what I need.

To give a little more background on my wild meta programming 😃, I am working with a COM API and, unfortunately, each COM call is relatively time prohibitive (COM issue, not python). Because of this I have been driven to create a framework, both on the python side and the VBA side of the application which allows hundreds of thousands of COM calls to be condensed into 1, resulting in many order of magnitude time decrease.

attr is great as it allows easy definition of what information is to be gathered on the VBA side, as well as how and when. Being able to vary what attr.ibs are gathered when allows each batch call to be as efficient as possible.