nineinchnick / trino-openapi

Trino connectors for accessing APIs with an OpenAPI spec
Apache License 2.0
22 stars 3 forks source link

Support OpenAPI 3.1 #58

Open LocNguyend opened 5 months ago

LocNguyend commented 5 months ago

I want to use Trino to query from the code snippet using FastAPI in Python to generate an open-api.yaml. I have successfully connected to the catalog and performed queries, but it only works when the query result returns a single row. If I have multiple rows, Trino reports that there are no columns. Is there a solution for this?

Here my code below, if i change response_model to Organ and return result_list[0] it's still working.

class Organ(BaseModel):
    sourceid: str
    organcode: str 
    organtypecode: str

def convert_row_to_base_model(row):
    base_model = Organ(sourceid=row[0], organcode=row[1], organtypecode=row[2])
    return base_model

@prefix_router.get("/trino", response_model=list(Organ))
async def query_trino_and_convert_data():
    conn = trino.dbapi.connect(
        host='192.168.1.123',
        port=8080,
        user='open-api',
        catalog='iceberg',
        schema='fss_rawstage'
    )
    cursor = conn.cursor()
    cursor.execute('SELECT sourceid,organcode,organtypecode FROM raw_stx_cpf_organization limit 5')
    results = cursor.fetchall()
    cursor.close()
    conn.close()
    result_list = []
    for row in results:
        base_model = convert_row_to_base_model(row)
        result_list.append(base_model)
    return result_list
nineinchnick commented 5 months ago

Can you share the OpenAPI spec generated by FastAPI, or better yet, share a link to a repository with the full app and some instructions how to run it?

LocNguyend commented 5 months ago

Here is the complete code of my app.py file. It defines a Python function that retrieves data from a database and converts it into an API using FastAPI. You can easily run it using the bash command "python3 app.py" with Python version 3.10. I follow the best practices of FastAPI for using an auth token because I found that Trino-OpenAPI cannot connect to the catalog of the base URI without a security scheme. You can modify the SQL statement as needed. The use of the prefix "/v3" to resemble OpenAPI-Petstore is optional. You can get the spec by access localhost:8001/openapi.json

import time
from datetime import date
from decimal import Decimal
import pandas as pd
import json

import trino
from trino import dbapi
import concurrent.futures
import uvicorn
from trino.auth import BasicAuthentication

from typing import Annotated
from fastapi.encoders import jsonable_encoder
from fastapi import FastAPI, Depends, HTTPException, status, Body, APIRouter
from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm
from passlib.context import CryptContext
from pydantic import BaseModel
# from trino import TrinoQuery

fake_users_db = {
    "johndoe": {
        "username": "johndoe",
        "full_name": "John Doe",
        "email": "johndoe@example.com",
        "hashed_password": "fakehashedsecret",
        "disabled": False,
    },
    "alice": {
        "username": "alice",
        "full_name": "Alice Wonderson",
        "email": "alice@example.com",
        "hashed_password": "fakehashedsecret2",
        "disabled": True,
    },
}

app = FastAPI()
prefix_router = APIRouter(prefix="/v3")

def fake_hash_password(password: str):
    return "fakehashed" + password

oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/oauth/token")

class UserInDB(User):
    hashed_password: str

def get_user(db, username: str):
    if username in db:
        user_dict = db[username]
        return UserInDB(**user_dict)

def fake_decode_token(token):
    # This doesn't provide any security at all
    # Check the next version
    user = get_user(fake_users_db, token)
    return user

async def get_current_user(token: Annotated[str, Depends(oauth2_scheme)]):
    user = fake_decode_token(token)
    if not user:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid authentication credentials",
            headers={"WWW-Authenticate": "Bearer"},
        )
    return user

async def get_current_active_user(
    current_user: Annotated[User, Depends(get_current_user)]
):
    if current_user.disabled:
        raise HTTPException(status_code=400, detail="Inactive user")
    return current_user

@app.post("/oauth/token")
async def login(form_data: Annotated[OAuth2PasswordRequestForm, Depends()]):
    user_dict = fake_users_db.get(form_data.username)
    if not user_dict:
        raise HTTPException(status_code=400, detail="Incorrect username or password")
    user = UserInDB(**user_dict)
    hashed_password = fake_hash_password(form_data.password)
    if not hashed_password == user.hashed_password:
        raise HTTPException(status_code=400, detail="Incorrect username or password")

    return {"access_token": user.username, "token_type": "bearer"}

@prefix_router.get("/users/me")
async def read_users_me(
    current_user: Annotated[User, Depends(get_current_active_user)]
):
    return current_user

class Organ(BaseModel):
    sourceid: str
    organcode: str 
    organtypecode: str

def convert_row_to_base_model(row):

    base_model = Organ(sourceid=row[0], organcode=row[1], organtypecode=row[2])
    return base_model

@prefix_router.get("/trino", response_model=Organ)
async def query_trino_and_convert_data():

    conn = trino.dbapi.connect(
        host='192.168.1.123',
        port=8080,
        user='open-api',
        catalog='iceberg',
        schema='fss_rawstage'
    )

    cursor = conn.cursor()

    cursor.execute('SELECT sourceid,organcode,organtypecode FROM raw_stx_cpf_organization limit 5')

    results = cursor.fetchall()

    cursor.close()
    conn.close()

    result_list = []
    for row in results:
        base_model = convert_row_to_base_model(row)
        result_list.append(base_model)

    return result_list[0]

app.include_router(prefix_router)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8001)

And here is catalog trino for the API. The base-uri and spec-location is your localhost after you run your app.py on local.

connector.name=openapi
spec-location=http://192.168.40.134:8001/openapi.json
base-uri=http://192.168.40.134:8001
# authentication.type=none
authentication.type=oauth
authentication.token-endpoint=/oauth/token
authentication.grant-type=password
authentication.username=johndoe
authentication.password=secret
nineinchnick commented 5 months ago

The parser I'm using (swagger-parser) for some reason, in both cases (a single object and an array), generates a spec tjat says the response contains a JsonSchema object. So far, none of the specs I tested used this type at all, it should be either ArraySchema or ObjectSchema.

I looked at the spec and nothing stands out, except that the fields also have a title property. I need to learn more about how FastAPI generates the spec.

nineinchnick commented 5 months ago

Looks like the difference is the OpenAPI version, I'm getting JsonSchema objects when spec is a OpenAPI 3.1, but it's StringSchema (or others) when it's 3.0.

LocNguyend commented 5 months ago

Looks like the difference is the OpenAPI version, I'm getting JsonSchema objects when spec is a OpenAPI 3.1, but it's StringSchema (or others) when it's 3.0.

Let me try to add openapi_version = 3.0 according this guide: https://fastapi.tiangolo.com/how-to/extending-openapi/ I also find this on the fastapi docs: https://fastapi.tiangolo.com/tutorial/schema-extra-example/. Dont know this help It can be fairly certain that the difference in OpenAPI versions. I'm not very familiar with OpenAPI. I use Python for machine learning-related functions, and it can be time-consuming to reimplement them in Java. Thank you for your help

nineinchnick commented 5 months ago

I am interested in adding support for OpenAPI 3.1, I just need to find a good example to use with tests. The first test I added is using the Swagger Petstore, but it is using OpenAPI 3.0. I'm hoping to find a more complete example app using FastAPI. It might take me a few days to complete this.

LocNguyend commented 5 months ago

I have switched to OpenAPI 3.0, and it works perfectly. Thank you for your suggestion

nineinchnick commented 5 months ago

I'll keep this open to add support for OpenAPI 3.1. Switching to OpenAPI 3.0 is a workaround but the core issue remains.

LocNguyend commented 5 months ago

I would like to ask about something unrelated to the current topic. When querying a table with Trino-openapi, if I describe that table and notice that Trino adds a column with a similar name but with the suffix "_req" for columns that have required parameters, how should I query when using "SELECT *" normal cols or cols with "_req"?

nineinchnick commented 5 months ago

The _req suffix is added to fields from the endpoint request, when a field with the same name also exists in the response, but has a different data type. The request fields are only useful for specifying predicates, so a query could look like SELECT * FROM table_with_required_params WHERE some_required_param_req = 'a value'.

The predicate value will be copied into the column when returning the query results.

If you have more questions like this, find me on the Trino Slack, or create a discussion in this repo (I just turned them on): https://github.com/nineinchnick/trino-openapi/discussions

nineinchnick commented 5 months ago

I also have an open issue to add more docs like this: https://github.com/nineinchnick/trino-openapi/issues/18