graphql-python / graphene-sqlalchemy

Graphene SQLAlchemy integration
http://docs.graphene-python.org/projects/sqlalchemy/en/latest/
MIT License
981 stars 228 forks source link

Does graphene_sqlalchemy keep sessions open? #292

Open mRcSchwering opened 4 years ago

mRcSchwering commented 4 years ago

tl;dr

Does graphene_sqlalchemy keep sessions open? I'm having the problem that Postgres doesnt run drop_all() in my test suite.

Explanation

Hi, I have a fastAPI app and I'm using graphene_sqlalchemy to generate an API from database models defined on the declarative base of SQLAlchemy. The database is postgres. For testing I am running some pytests against the API while it's running. So, I start a throw-away postgres container, then I start the API app, then I run pytest.

The tests include a reset_testdata() function which also uses the SQLAlchemy ORM. It looks like:

def reset_testdata():
    close_all_sessions()  # from sqlalchemy.orm.session
    models.Base.metadata.drop_all(bind=engine)
    models.Base.metadata.create_all(bind=engine)

    db.add_all([
        # ... add stuff
    ])

    db.commit()
    db.close()

I noticed, that once reset_testdata() is used, the pytest process hangs. No errors, not able to Ctrl+C, it just waits. I had this issue before and it usually stems from Postgres not allowing stuff like drop_all() if there are still active sessions. After some trying out I found that queries from my app (which are implemented by graphene_sqlalchemy) seem to keep a session open. (I used this example in my app: https://docs.graphene-python.org/projects/sqlalchemy/en/latest/tutorial/#defining-our-models)

I tested this, and basically I can circumvent this problem by adding a middleware that closes all sessions after every request.

@app.middleware('http')
async def close_sessions(request: Request, call_next):
    try:
        response = await call_next(request)
    finally:
        close_all_sessions()
    return response

I wonder whether I am missing something here? Any experience with this issue (if it is even an issue)?

smentek commented 3 years ago

https://www.oddbird.net/2014/06/14/sqlalchemy-postgres-autocommit/

davidroeca commented 3 years ago

I encountered a few issues using fast api + scoped_session in a small app I was working on (I think it was because of a stale select -- restarting the app provided a temporary fix).

Here's what I use for my session

engine = create_engine(SQLALCHEMY_DATABASE_URL)
session = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))
# ... create Base, as in the example
Base.query = session.query_property() 

Anyway, I found this issue + the following doc to be helpful in debugging it for myself: https://docs.sqlalchemy.org/en/14/orm/contextual.html?highlight=scoped_session#unitofwork-contextual

Straight from the SQLAlchemy docs:

method sqlalchemy.orm.scoping.scoped_session.query_property(query_cls=None):

return a class property which produces a Query object against the class and the current Session when called.

e.g.:

   Session = scoped_session(sessionmaker())

    class MyClass(object):
        query = Session.query_property()

    # after mappers are defined
    result = MyClass.query.filter(MyClass.name=='foo').all()

Produces instances of the session’s configured query class by default. To override and use a custom implementation, provide a query_cls callable. The callable will be invoked with the class’s mapper as a positional argument and a session keyword argument.

There is no limit to the number of query properties placed on a class.

See here

For me, a comparable middleware is what I think is needed:

@app.middleware('http')
async def reset_session_registry(request: Request, call_next):
    try:
        response = await call_next(request)
    finally:
        session.remove()
    return response
dmitrybabanovforreal commented 3 years ago

@davidroeca I've just encountered the same issue with FastAPI and GrapqhQL. As tiangolo mentioned here, it's not a good idea to use a scoped_session.

So I create and pass a new session with each query from the endpoint like that using a dependency:

router = fastapi.APIRouter()

graphql_app = starlette.graphql.GraphQLApp(schema=Schema(query=Query, mutation=Mutation))

@router.post('')
async def query(request: Request,
                db: Session = Depends(deps.get_db),
                current_user: User = Depends(deps.get_current_active_user)):
    request.scope.setdefault('db_session', db)
    request.scope.setdefault('user', current_user)
    return await graphql_app.handle_graphql(request=request)

And then I get it from the info and pass it to the info.context['session'] just like the documentation suggests

class UserGraphQL(SQLAlchemyObjectType):
    courses = Field(List(CourseGraphQL), resolver=resolve_courses, args={'id': Int(), 'all': Boolean()})
    lessons = Field(List(LessonGraphQL), resolver=resolve_lessons, args={'id': Int(), 'all': Boolean()})

    class Meta:
        model = User

class Query(ObjectType):
    users = Field(List(UserGraphQL))

    def resolve_users(self, info, **kwargs):
        try:
        info.context['session'] = info.context['request'].scope['db_session']

        current_user = info.context['request'].user
        query = UserGraphQL.get_query(info)
        return query.filter(User.id == current_user.id).all()
davidroeca commented 3 years ago

Unfortunately, that renders a lot of graphene-sqlalchemy's relationship and queries capabilities useless. My preferred approach is closer to this one within a similar thread.

Along the lines of:

from contextvars import ContextVar
import uuid

_request_id_ctx_var = ContextVar('request_id', default=None)
def get_request_id():
    return _request_id_ctx_var.get()

engine = create_engine(SQLALCHEMY_DATABASE_URL)
session = scoped_session(
    sessionmaker(autocommit=False, autoflush=False, bind=engine),
    scopefunc=get_request_id,
)
# ... create Base, as in the example
Base.query = session.query_property()

@api.middleware("http")
async def set_request_id(request, call_next):
    request_id = str(uuid.uuid4())
    ctx_token = _request_id_ctx_var.set(request_id)

    response = await call_next(request)

    _request_id_ctx_var.reset(ctx_token)
    return response
dmitrybabanovforreal commented 3 years ago

@davidroeca I don't know it works well for me. I have lots of one-to-many relationships between objects that render correctly in nested responses And I use SQL Alchemy models as meta classes in GraphQL SQL Alchemy classes. So I pretty much use the most important features without any problems.

I updated the code in my message above a bit because the previous version caused errors when I query objects along with their child objects. This one works perfectly well and I built it in accordance with FastAPI docs best practices and GraphQL SQL Alchemy recommendation about using a session

davidroeca commented 3 years ago

@dmitrybabanovforreal did you read this doc (in particular, this section)?

I don't think avoiding scoped_session is necessarily a "best practice" but rather a recommendation from tiangolo to reduce confusion for new users. In this case, avoiding scoped_session requires you to write resolver functions for every relationship. Using scoped_session in the way SQLAlchemy recommends (one session per request) will enable you to use graphene-sqlalchemy with its original design while not having to write custom code if you don't have to.

Now, if you're stuck writing your own resolvers anyway (due to additional custom logic, or something else), then your approach is still a fine solution. The more models/relationships, the more painful it gets though.