Closed mesozoic closed 1 year ago
@mesozoic , could you please clarify if there will also be a provision for generating the Pydantic schema for the table metadata?
@mesozoic , could you please clarify if there will also be a provision for generating the Pydantic schema for the table metadata?
My first thought here is just to use Pydantic to represent all the complex metadata we get back from the API (schemas, webhooks, etc). Building a Pydantic model to reflect a table's data is an interesting idea, but I'm not sure how useful it will be (since the ORM module does not use Pydantic under the hood).
If I'm not quite getting your meaning, perhaps you could clarify your use case?
The way I'm using pyairtable with Pydantic at the moment, and I'm pretty new to Pydantic:
Base class for any AirTabel data:
Note that the record_id
and record_created_time
are excluded from serialization. This way we can get a record, modify it, and call table.update()
on the same object.
from pydantic import BaseModel, Field
class AirTableRow(BaseModel):
record_id: str = Field(None, alias=str("id"), exclude=True)
record_created_time: str = Field(None, alias=str("createdTime"), exclude=True)
class Config:
extra = "forbid" # Catch typos and field name changes
allow_population_by_field_name = True
@classmethod
def from_dict(cls, d):
return cls(record_id=d["id"], record_created_time=d["createdTime"], **d["fields"])
Now for each table, we need to define the schema.
Note that the calculated field, again, is excluded from serialization, as it can't be part of an update.
class Data(AirTableRow):
field1: str = Field(None, alias=str("Field 1"))
field2: int = Field(None, alias=str("Field 2"))
calculated_field: int = Field(None, alias=str("Calculated Field"), exclude=True)
Now using the class with pyairtable:
table = pyairtable.Table(AT_API_KEY, "appDsQdcFsh1bJlGE", "Test")
data = [ Data.from_dict(d) for d in table.all() ]
data
[Data(record_id='rec4yN9Jr6cjH6zbW', record_created_time='2023-07-09T07:39:40.000Z', field1='Hello there.', field2=345, calculated_field=357),
Data(record_id='recJSJdoiBdNOqeTP', record_created_time='2023-07-09T07:39:40.000Z', field1='General Kenobi!', field2=123, calculated_field=138)]
Here is what happens if we serialize the data:
list(map(lambda x: x.dict(by_alias=True, exclude_unset=True), data))
[{'Field 1': 'Hello there.', 'Field 2': 345},
{'Field 1': 'General Kenobi!', 'Field 2': 123}]
So to update a record, we can do
data[1].field2 = 4321
table.update( data[1].record_id, data[1].dict(by_alias=True, exclude_unset=True) )
{'id': 'rec4yN9Jr6cjH6zbW',
'createdTime': '2023-07-09T07:39:40.000Z',
'fields': {'Field 1': 'Lalalala', 'Field 2': 345, 'Calculated Field': 353}}
Or with a new Data object (I split it into 2 lines for better readability):
update = Data(field1="Lalalala")
table.update(data[0].record_id, update.dict(by_alias=True, exclude_unset=True))
{'id': 'recJSJdoiBdNOqeTP',
'createdTime': '2023-07-09T07:39:40.000Z',
'fields': {'Field 2': 4321,
'Field 1': 'General Kenobi!',
'Calculated Field': 4336}}
This provides at least some type of safety and working IntelliSense. There is definitely space for improvement, for examples we could have a way to
data = Data.all()
return a List[Data]
.
What I'm suggesting is, let's have an official tool that takes the table schema from airtable, and generates the pydantic boilerplate. Would it make sense? Is there enough information provided by the API to generate it automatically?
@xl0 What you're describing seems like an interesting approach to consider when we get around to autogenerating ORM classes from table schemas (probably 3.0; see roadmap in #249). I think we'll need to weigh whatever advantages or new features it provides against whatever ways it might break backwards-compatibility with the current ORM module.
This thread was intended solely to suggest using Pydantic (vs. plain old dicts) for metadata like schemas, webhooks, etc. Seems like that's probably acceptable, since this is not the only thread where I've heard general enthusiasm for using Pydantic in more places :)
This is a proposal for how to represent complex nested data structures from the Airtable API. This proposal would benefit from, but does not strictly require, removing ApiAbstract (see #257).
tl;dr
I'd like to add pydantic as a dependency and use that to serialize and deserialize Airtable models from their metadata APIs.
Rationale
Most Python developers these days use supportive development tools that can provide type hints, autocomplete, and more. Developers who need to interact with the nested data structures returned by the Airtable API would benefit from being able to navigate those within their code editors' tooling.
Some projects using this library might also want to enforce strict typing, and today there's no common way for them to ensure that the properties they reference on pyairtable's return types actually exist.
Design
The module layout is very open to discussion, but it could be something like this:
pyairtable/
metadata/
(new)field_schema.py
: Models for each type of field config.base_schema.py
: Models for base and table schemas, permissions, and invites.webhooks.py
: Models for interacting with webhooks.api/
api.py
: We might add some utility methods for accessing metadata APIs.enterprise.py
(new): Define anEnterprise
class and methods for managing users and retrieving audit logs.workspace.py
(new): Define aWorkspace
class and methods for managing workspace permissions.base.py
: AddBase
methods for retrieving and managing base schemas, permissions, and webhooks.table.py
: AddTable
methods for retrieving and managing table/field schemas.Example
From a user's perspective, this will be relatively transparent. They will call methods that we expose on classes defined in
pyairtable.api
, and retrieve normal Python data structures that they can interact with:For now I'm not envisioning these data structures knowing how to call the API or save modifications to themselves. We can probably start with bespoke methods for each type of modification, for example:
I haven't taken the time to think through the exact names/signatures of every method we'd add, but I think we can probably consider those as we go.
Deprecation
We would mark the existing functions in
pyairtable.metadata
as deprecated, for removal in 3.0.0. Alternatively we could mark them as deprecated in a point release (1.5.1) and then remove them in 2.0.0. My instinct is to err toward compatibility.Future
A couple other ideas that I haven't explored much:
It is possible we could make ORM-like features with these objects, such as manipulating their state and calling
.save()
directly. For now I've not contemplated this too deeply, as I am mostly focused on being able to read state from the API.We could have a dict-like (backwards compatible)
Record
dataclass that definesid
,created_time
, andfields
. I consider that out of scope for this proposal because it's data and not metadata. I thinkpyairtable.orm
is a better pattern to follow.Alternatives considered
Dict[str, Any]
. Sure, it works, but where's the fun in that? :grin:TypedDict
s. The number of TypedDict definitions to create and maintain would make this alternative no less complex or burdensome for the package's maintainers, but it would represent significantly less functionality for developers who use this library.Thoughts?