kayak / pypika

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
http://pypika.readthedocs.io/en/latest/
Apache License 2.0
2.43k stars 293 forks source link

Wrapper to fix typing #749

Open Pyrolistical opened 10 months ago

Pyrolistical commented 10 months ago

Awesome project, but the typing is really poor. I assume this is because this project was created before type hinting was added.

Right now it is impossible to use this library without a screen full of type errors.

I do think this is salvageable. I created wrappers around Query, Table, etc with various simplifications to add type, but at the same time retain the same functionality.

I also snuck in some fixes to gripes I had and the API is much more consistent, but the API more or less the same.

Major changes:

  1. TypedQuery instead of Query and TypedTable instead of Table
  2. ALL CAPS for sql keywords. TypedQuery.FROM instead of Query.from_. This fixes the awkward from_, isin and as_
  3. Removed Table("some name").some_field and Table("some name")["some_field"]. Use TypedTable("some name").field("some_field") instead. This fixes a lot of type errors as the return types of the overridden __getattr__ and __getitem__ are not used.
  4. IN takes a vararg of terms instead of isin that takes in a collection, to be consistent with SELECT
Here is the incomplete WIP ```python from dataclasses import dataclass from pypika import Query, Table, QmarkParameter, Order, Criterion # type: ignore from typing import Any @dataclass class INCriterion: in_criterion: Any @dataclass class Term: term: Any @dataclass class TypedCriterion: criterion: Criterion def __and__(self, other: Any) -> "TypedCriterion": right = None match other: case TypedCriterion(criterion): right = criterion case _: raise NotImplementedError(f"unsupported type {type(other)}") return TypedCriterion(criterion=self.criterion.__and__(right)) @dataclass class TypedField: field: Any def IN(self, *terms: Term | QmarkParameter) -> INCriterion: resolved_terms = [] for term in terms: match term: case Term(term): resolved_terms.append(term) # type: ignore case QmarkParameter() as qmark_parameter: resolved_terms.append(qmark_parameter) # type: ignore return INCriterion(in_criterion=self.field.isin(tuple(resolved_terms))) # type: ignore def AS(self, name: str) -> "TypedField": self.field = self.field.as_(name) return self def __eq__(self, other: Any) -> TypedCriterion: # type: ignore right = None match other: case TypedField(field): right = field case QmarkParameter() as qmark_parameter: right = qmark_parameter case _: raise NotImplementedError(f"unsupported type {type(other)}") return TypedCriterion(criterion=self.field.__eq__(right)) @dataclass class TypedTableStar: star: Any def __init__(self, star: Any): self.star = star @dataclass class TypedTable: table: Table def __init__(self, name: str): self.table = Table(name) @property def star(self) -> TypedTableStar: return TypedTableStar(self.table.star) def prefix_name(self, postfix: str) -> TypedField: return self.field(self.table.get_table_name() + postfix) def field(self, name: str) -> TypedField: return TypedField(field=self.table.field(name)) @dataclass class TypedJoin: parent: "TypedQuery" join: Any def USING(self, column: str) -> "TypedQuery": self.parent.query = self.join.using(column) return self.parent def ON(self, criterion: TypedCriterion) -> "TypedQuery": self.parent.query = self.join.on(criterion.criterion) return self.parent @dataclass class TypedQuery: query: Any = None @staticmethod def FROM(table: TypedTable) -> "TypedQuery": self = TypedQuery() self.query = Query.from_(table.table) return self @staticmethod def UPDATE(table: TypedTable) -> "TypedQuery": self = TypedQuery() self.query = Query.update(table.table) # type:ignore return self def SET(self, field: TypedField, value: Any) -> "TypedQuery": self.query = self.query.set(field.field, value) return self def INNER_JOIN(self, table: TypedTable) -> "TypedJoin": return TypedJoin(parent=self, join=self.query.inner_join(table.table)) def LEFT_JOIN(self, query: "TypedQuery") -> "TypedJoin": return TypedJoin(parent=self, join=self.query.left_join(query.query)) @staticmethod def INTO(table: TypedTable) -> "TypedQuery": self = TypedQuery() self.query = Query.into(table.table) return self def COLUMNS(self, *fields: str) -> "TypedQuery": self.query = self.query.columns(*fields) return self def INSERT(self, *values: Any) -> "TypedQuery": self.query = self.query.insert(*values) return self def ORDER_BY(self, *fields: TypedField, order: Order | None = None) -> "TypedQuery": self.query = self.query.orderby(*(field.field for field in fields), order=order) return self def SELECT(self, *fields: TypedField | TypedTableStar) -> "TypedQuery": resolved_fields = [] for f in fields: match f: case TypedField(field): resolved_fields.append(field) # type: ignore case TypedTableStar(star): resolved_fields.append(star) # type: ignore self.query = self.query.select(*resolved_fields) return self def AS(self, name: str) -> "TypedQuery": self.query = self.query.as_(name) return self def WHERE(self, criterion: TypedCriterion | INCriterion) -> "TypedQuery": resolved_criterion = None match criterion: case TypedCriterion(criterion=c): resolved_criterion = c case INCriterion(in_criterion): resolved_criterion = in_criterion self.query = self.query.where(resolved_criterion) return self def __str__(self) -> str: return str(self.query) ```

Thoughts? Should we add the wrapper as a shim and then rewrite the internals to simplify?

AzisK commented 9 months ago

Hi Pyrolistical. I am glad you like this project. I agree with your ideas for syntax. We could make this change with a major version bump. Would you like to make such a PR?

P.S. I am not sure what does "shim" mean

Pyrolistical commented 9 months ago

Happy to make the PR. If we are going to make a major breaking release, does this mean we should just directly modify Query and forgo the TypedQueue wrapper?