meltano / sdk

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com
https://sdk.meltano.com
Apache License 2.0
87 stars 64 forks source link

Support defining configuration and stream schemas using Pydantic #110

Open MeltyBot opened 3 years ago

MeltyBot commented 3 years ago

Migrated from GitLab: https://gitlab.com/meltano/sdk/-/issues/110

Originally created by @edgarrmondragon on 2021-04-20 22:19:33


Pydantic is the most popular data validation and serialization library for Python at the moment. It is used by the new-ish and increasingly popular web framework FastAPI and by other frameworks wanting to support reliable data validation and serialization (like odmantic for MongoDB and pydantic-sqlalchemy).

Alternatives include mashumaro, marshmallow and dataclasses-json. All of these support only a subset of Pydantic's features (no validation, serialization but no deserialization).

I propose leveraging Pydantic to allow the SDK user to define a plugin's configuration and inline stream schemas using a known, powerful and well-documented library. The implementation would look something like the following:

from pydantic import BaseModel

class BaseSchema(BaseModel):
    # Created by the SDK developers to support the specifics of Singer schemas

class TapTestConfig(BaseSchema):
    # Plugin config created by the SDK user

class ExampleStreamSchema(BaseSchema):
    # Stream schema created by the SDK user

Notes:

MeltyBot commented 2 years ago

View 4 previous comments from the original issue on GitLab

anden-akkio commented 1 month ago

+1

Pydantic is way more friendly towards static type analysis via mypy/pyright and cuts down drastically on "stupid" type errors at scale and/or when running in prod

edgarrmondragon commented 1 month ago

There's a draft MR on GitLab that unfortunately never came through but could serve as inspiration for folks interested in contributing: https://gitlab.com/meltano/sdk/-/merge_requests/84