supabase / postgrest-py

PostgREST client for Python. This library provides an ORM interface to PostgREST
https://postgrest-py.readthedocs.io
MIT License
237 stars 52 forks source link

Add return types #301

Open Filsommer opened 1 year ago

Filsommer commented 1 year ago

Note: I'm on the latest version: supabase==1.0.3

Is your feature request related to a problem? Please describe. Related to this issue. However, that issue isn't fixed yet

Describe the solution you'd like When executing supabase.table("item").select().execute().data, it is crucial for me and other users to get a return type beyond Any. In this query the return type should be Item, just as it is when using supabase with Typescript.

Describe alternatives you've considered I saw Prisma for Python offers this. E.g. items: List[Item] = await prisma.item.find_many() works as expected.

anand2312 commented 1 year ago

Thanks for the issue @Filsommer This can be achieved if we have the user define "models" (in ORM-speak) themselves and then pass them on to us. Prisma (which is an excellent library by the way) generates the Pydantic models for the user from it's schema, but we have no such schema. I had tried an implementation of this, but ultimately I did not merge it as it resulted in a lot of code duplication (almost every request builder class had to be copied - one generic version and one that wasn't). This was because I had to set the type of APIResponse.data to List[dict[str, Any]] in the cases where the user did NOT provide a well-typed model to parse into, but there's no neat way yet in Python to set a default for a type parameter like that. (See PEP 696 which would fix this issue but it's been a draft for a while now 😦 ) I hope that explains this a bit - I think I might just write the old implementation anyways now (it's been a year since the PR I linked) but I'm open to discussion for better ideas!

Edit: a more complete implementation of the PR I linked above - anand2312/postgrest-py#1

Filsommer commented 1 year ago

Thanks for the insights @anand2312 Could you elaborate what you said with have the user define "models" (in ORM-speak) themselves and then pass them on to us.? How could I technically do that? You mean something like this? data: List[Item] = supabase.table("item").select().execute().data

My linter (Pylance) complains:

Expression of type "List[Dict[str, Any]]" cannot be assigned to declared type "List[Item]"
  "List[Dict[str, Any]]" is incompatible with "List[Item]"
    TypeVar "_T@list" is invariant
      "Dict[str, Any]" is incompatible with "Item"
anand2312 commented 1 year ago

@Filsommer The library doesn't know until runtime what the type of the object returned by the query is, so we are forced to have the user describe the type (by making a class with all the fields that the table has) and then pass that class to the query, indicating that that model should be used for the return type. This would be similar to how "models" work in popular Python ORMs like SQLAlchemy. Right now the proposed functionality hasn't been actually merged so sadly there's no easy way for you (the user) to get this working. The only thing I can think of is writing a helper function to parse the query result into pydantic models:

from typing import Any, TypeVar, Type
from pydantic import BaseModel

_ModelT = TypeVar("_ModelT", bound=BaseModel)

class Item(BaseModel):
    ... # fields are defined here

def parse_query(cls: Type[_ModelT], data: list[dict[str, Any]]) -> list[_ModelT]:
  return [cls(**row) for row in data]

items = parse_query(Item, supabase.table("item").select().execute().data)
reveal_type(items)  # Type of items is list[Item]
anand2312 commented 1 year ago

Can you check out the PRs I linked and tell which of the APIs I've shown in the examples there look better from a user's perspective? Thanks 😄

Filsommer commented 1 year ago

@anand2312, I read your PR and really like the direction you're taking.

Some remarks: r = client.from_model(Country).select("*,cities(*)").execute() looks like the more elegant solution.

I feel like r = client.from_("countries").select("*,cities(*)").execute(model=Country) should be avoided as it increases the amount of points of failure. E.g. changing the Table name from "countries" to "country" would result in having to change the argument to from_("country") and also the __table_name__ class field.

I cloned your PR and experimented with some cool ideas: Firstly, I was able to remove most of your duplication by adding the Generics to the original request builders when using from_model, while maintaining backwards compatibility with the from_ and table methods. The result is a list of dicts instead of class instances for now, but that's fine for now IMO. Secondly, as a dev I would love to keep my python models in sync with my DB tables. I wrote a small (still very initial stage!) script to parse the schemas and generate models automatically, and it worked surprisingly well.

from jinja2 import Environment, FileSystemLoader
from pathlib import Path
from httpx import get
import json
from typing import Union

def map_type(type: str, format: str, description: Union[None, str], relations: dict, class_name: str) -> str:
    # TODO improve this parsing algo
    if description and "Foreign Key" in description:
        parent_type = description.split("table='")[1].split("'")[0]
        relations[parent_type] = {"field_name": class_name, "field_type": f"List[{class_name.capitalize()}]"}
    output_type = ""
    if type == "integer":
        output_type = "int" 
    elif type == "string" and "timestamp" in format:
        output_type = "datetime" 
    elif type == "string":
        output_type = "str"
    elif type == "number":
        output_type = "float"
    elif type == "boolean":
        output_type = "bool"
    else:
        output_type = "None" 
    return output_type

r = get(f"https://{YOUR_URL}.supabase.co/rest/v1/?apikey={YOUR_KEY}")
definitions = json.loads(r.text)["definitions"]
models = {}
relations = {}
for key, value in definitions.items():
    models[key] = []
    for property_key, property_value in value.get("properties", {}).items():
        models[key].append({"field_name": property_key, "field_type": map_type(property_value.get("type"), property_value.get("format"), property_value.get("description"), relations, key)})
environment = Environment(loader=FileSystemLoader(Path(__file__).parent / "templates"), trim_blocks=True, lstrip_blocks=True)
template = environment.get_template("client.py.jinja")
output = template.render(models=models, relations=relations)

with open(Path(__file__).parent / 'models.py', 'w') as f:
    f.write(output)

This would output models.py which can then be consumed by the from_model method:

from __future__ import annotations
from typing import Dict, Tuple, Type, TypeVar, Union, cast, Optional, List
from datetime import datetime
from pydantic import BaseModel

class Cities(BaseModel):
    """Represents a cities record"""
    id: Optional[int]
    created_at: Optional[datetime]
    country_id: Optional[int]
    city_name: Optional[str]
    __table_name__: ClassVar[str] = "cities"

class Countries(BaseModel):
    """Represents a countries record"""
    id: Optional[int]
    created_at: Optional[datetime]
    country_name: Optional[str]
    cities: Optional[List[Cities]]
    __table_name__: ClassVar[str] = "countries"

Another approach I tried was the prisma way, e.g.: response = client.countries.select("*,cities(*)").execute().data # Type is List[Countries], but still need to tweak some things to make it work. Feels more hacky for now. Maybe we could chat somewhere (twitter?) to better discuss this later if you want :)

anand2312 commented 1 year ago

Thanks for the feedback @Filsommer ! Auto-generating the models had been on my mind too, very interesting to see an actual solution 😄 Although I doubt I'd put that in the main library (probably an extension library that works along with supabase-py?)

Are you on Discord? We can chat in the supabase discord server - https://discord.gg/kk7UgwWYen (i'm @anand2312 on Discord)

NixBiks commented 1 year ago

I'd like to join the discussions too. One thing; I don't think we should go with pydantic. It's a great library but it's primarily a data validation library. From the Swagger schema from Postgrest we know the types - no need to validate them since the types will be generated from the schema. I suggest using something like attrs. We could get some inspiration from openapi-python-client which generates a python client from OpenAPI 3.x.

Feel free to ping me on Discord.

NixBiks commented 1 year ago

Another option is to generate based on the SQL Schema by connecting directly to the database. Similar to sqlacodegen.

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 365 days with no activity.

Daspy11 commented 3 hours ago

This would be really helpful