Open atfrase opened 3 years ago
hi -
the specific request, "I suggest that Column.__init__()
should allow self.key to remain None if unspecified in kwargs rather than defaulting to self.name" , would be a major change that may have unknown implications for existing code and as we are well into the 1.4 series where new regressions are being reported daily, I don't know that it's an option to change this; during the beta period for 1.4 would have been a much better time to suggest a significant change like this but it would have required a backwards-compatible path as well. The Table creation process does appear to apply .key to the Column at table construction time if not set, however the change as given , which would be:
diff --git a/lib/sqlalchemy/sql/schema.py b/lib/sqlalchemy/sql/schema.py
index 6ab58c3013..050488237d 100644
--- a/lib/sqlalchemy/sql/schema.py
+++ b/lib/sqlalchemy/sql/schema.py
@@ -1562,7 +1562,7 @@ class Column(DialectKWArgs, SchemaItem, ColumnClause):
)
super(Column, self).__init__(name, type_)
- self.key = kwargs.pop("key", name)
+ self.key = kwargs.pop("key", None)
self.primary_key = primary_key = kwargs.pop("primary_key", False)
self._user_defined_nullable = udn = kwargs.pop(
breaks a lot of existing reflection tests, as well as declarative tests, and such and we are not in any place to be changing something that fundamental right now. Additionally, the change would immediately change the naming scheme used by declarative mappings which also would be a major behavioral change and again this can't be made within the 1.4 series; realistically it would have to be a 2.1 kind of thing as the goal of 2.0 is to be a clean switch from 1.4.
to summarize the current behavior, think of declarative as doing this imperative mapping:
mapper(cls, table, properties={
"id": table.c.id,
"name": table.c.name
})
that is, the assumption "because the attribute key will always equal the column key, even if the SQL name differs" is not actually accurate as mapped attribute name and column key have always been completely independent long before declarative was introduced:
mapper(cls, table, properties={
"id": table.c.SomeId,
"name": table.c.AnyName
})
So all of that said, whether or not we made such a change, it would need to forever be configurable in any case, including that it could work the old way if we did default Column(key=None). So I would propose a new API that allows the developer a transparent hook to set it whichever way they'd prefer, this is more feasible to be added within 1.4 so that this customization in either direction, as well as based on more complex schemes, is available:
from sqlalchemy.orm import registry
def undefer_column_name(attribute_name, column):
column.key = attribute_name
reg = registry(undefer_column_name=undefer_column_name)
Base = reg.generate_base()
class User(Base):
__tablename__ = 'user'
id = Column('user_id', Integer, primary_key=True)
name = Column('user_name', String(50))
I guess what I meant by self.key = kwargs.pop("key", None)
was not just to change that one line of code, I didn't expect that would work by itself; I meant to imply that it would also have to include whatever changes were required in the other parts of the code which expect Column.key
to be defined prior to class mapping (at which time the attribute key can finally be known).
That said, I certainly understand the backwards compatibility argument, and I'd be perfectly happy with having to explicitly instruct the registry/mapper/whatever to change the column key during mapping, as in your example.
That raises the question, though, is it even possible right now to change a column key after adding the Column
to a Table
? I assume simply doing MyClass.__table__.c.colkey.key = 'attrkey'
would not work, unless there are setattr hooks somewhere I haven't noticed yet. But is there some other sequence of calls that could accomplish this cleanly, which would take care of making the appropriate changes to all the various dictionaries and properties that contained 'colkey'?
I guess what I meant by
self.key = kwargs.pop("key", None)
was not just to change that one line of code, I didn't expect that would work by itself; I meant to imply that it would also have to include whatever changes were required in the other parts of the code which expectColumn.key
to be defined prior to class mapping (at which time the attribute key can finally be known).
yes of course, my point was, that's a lot of changes as well as a lot of assumptions being changed, not just in ORM but in Core. We're not in a place to change fundamental assumptions at the moment as the 1.4 release already changed a lot of fundamental assumptions already which of we are still working through the effects.
That said, I certainly understand the backwards compatibility argument, and I'd be perfectly happy with having to explicitly instruct the registry/mapper/whatever to change the column key during mapping, as in your example.
That raises the question, though, is it even possible right now to change a column key after adding the
Column
to aTable
?
the hook I proposed would be intercepted by declarative when it receives the Column object but before it has assembled it to a Table, because it's not feasible to change the .key once the Column is assembled into the Table.c collection. that is, the hook would receieve non-table-bound columns only and it would be called at https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/orm/decl_base.py#L736 .
I assume simply doing
MyClass.__table__.c.colkey.key = 'attrkey'
would not work, unless there are setattr hooks somewhere I haven't noticed yet. But is there some other sequence of calls that could accomplish this cleanly, which would take care of making the appropriate changes to all the various dictionaries and properties that contained 'colkey'?
You'd have to rebuild the whole .c collection on the Table, and anything that was built from it including mappers, since there are likely places in the code that assume table.c[key].key == key.
That all makes sense.
I wonder, would it work to make this possible using the existing events system?
There are already events that fire before or after all classes are instrumented, and before or after each individual class is instrumented, and an event that fires after an individual attribute is instrumented, so maybe all that's needed is to add one more event hook before an attribute is instrumented.
For example:
@event.listens_for(Base, 'before_attribute_instrument')
def receive_before_attribute_instrument(cls, key, obj):
if isinstance(obj, Column):
obj.key = key
That all makes sense.
I wonder, would it work to make this possible using the existing events system?
yes this could also be an event. it would just be a very specific event hook in just one place in declarative that makes it a little unusual, but there's no issue with it working that way.
There are already events that fire before or after all classes are instrumented, and before or after each individual class is instrumented, and an event that fires after an individual attribute is instrumented, so maybe all that's needed is to add one more event hook before an attribute is instrumented.
For example:
@event.listens_for(Base, 'before_attribute_instrument') def receive_before_attribute_instrument(cls, key, obj): if isinstance(obj, Column): obj.key = key
yes, something like that but it would be very specific to Column objects that aren't attached to a Table yet. it might be more useful as a general hook for all kinds of attributes that are extracted by declarative, like SQL expressions, relationship() objects, etc.
When a declarative class attribute is assigned a
Column
with aname
that differs from the class attribute name, but with itskey
left unspecified, I expect to find thatColumn
inMyClass.__table__.c
under the same key as it was given in the class itself (which the mapper then reassigns to anInstrumentedAttribute
). Instead, its key under theTable
object defaults to its SQLname
, which may not even be a valid Python identifier.I suggest that
Column.__init__()
should allowself.key
to remainNone
if unspecified inkwargs
rather than defaulting toself.name
, and when the declarative mapper finds aColumn
object assigned to a class attribute, it should check forcolumn.key == None
and update it to match the attribute key.The long version:
As I understand it, each field of an ORM model is associated with three potentially distinct identifiers:
MyClass.attrkey # <sqlalchemy.orm.attributes.InstrumentedAttribute object ...>
MyClass.__table__.c.colkey # Column(...)
str(select(MyClass.attrkey)) # SELECT mytable.sqlname ...
Under declarative mapping, all three identifiers can be explicitly defined:
The attribute key cannot be omitted, but if
key=None
it will default to the name argument ("sqlname"), and ifname=None
it will default to the attribute key ("attrkey").An analogous imperative mapping might look like this:
This time it is the SQL name which cannot be omitted, but if
key=None
it will again default to the name argument ("sqlname"). This time the attribute key cannot be specified, and will always default to the key argument ("colkey"); this means supplyingColumn(key=key)
under the imperative style effectively specifies both the column key and the attribute key.Taken separately these behaviors seem eminently reasonable, but they lead to a subtle inconsistency:
Column(key=None)
will set it equal to the SQL name, even if the attribute key differs)Between these two behaviors, the former seems more intuitive to me since it makes sense to prefer that both Python-side identifiers match (
MyClass.field
<=>MyClass.__table__.c.field
) unless explicitly overridden by the user.But it may also be worth considering if there's a compelling use case for even allowing the user to override the column key to differ from the attribute key (which is only even possible with declarative mapping, AFAIK it cannot be achieved with imperative mapping). It makes sense to support distinct SQL names since existing database schemas may contain identifiers that are inconvenient or illegal as Python identifiers, but since both the attribute and column keys are only used on the Python side and are subject to exactly the same naming constraints and conventions, what purpose does it serve to ever set them differently?