Closed albireox closed 3 months ago
This does seem to fix it. I've tested a few valis routes, the cone search and db info search, and they worked. I also tested the zora search submission, which works. However the target page from zora doesn't work, and throws the same error. This may be related to the concurrent queries from the Target page. I'd like to preserve that if possible since it speeds up page loading.
As for the ContextVar
thing, we may not need this anymore. I was originally following this FastAPI guide on how to implement peewee with async. Since then, they've deprecated support for peewee since it doesn't play well with async, and suggest using sqlalchemy. https://fastapi.tiangolo.com/how-to/sql-databases-peewee/?h=peewee
Another question is should we just make the default be db_reset: false
? This would also fix the concurrent issue, but would, I think, essentially create a global db connection, which I'm not sure is what we want. FastAPI recommends creating a new connection per request, thus wrapping it up into a Dependency, but maybe we don't care about that?
Ok, I'll have a look at that issue. I think right now if db_reset=True
the code should be almost identical to before the original change, but I must have missed something.
I think by default db_reset
should always be True, see https://github.com/sdss/valis/blob/3468f2e95ea1fa1b935140551c10f40e178e9e47/python/valis/settings.py#L42 We can discuss whether we want to change that. I think it's safer to create a new connection for each request; the only problem is that creating a new connection triggers a new reflection of the catalogdb
tables, which causes a non-negligible overhead. That could make quick queries to take several times what they should due to the reflection, which anyway is not needed. But I think there are ways to prevent that (or we can add options to sdssdb
).
OK, can you give this one more try after pulling? Note that I have also update sdssdb to 0.12.3. This is not strictly necessary for this issue but 0.12.2 would cause reflection of catalogdb
in pipelines
to be extremely slow because of all the missing tables.
I cannot say I fully understand the issue yet, but there seemed to be two additional problems:
reset_db_state
depend does need to be an async
function or the ContextVar
doesn't seem to work properly.db_reset=True
, if Zora requests multiple routes concurrently when the DB has not yet been initialised there would be a race condition error because multiple instances would try to initialise the DB at the same time. So I added an asyncio.Lock
to prevent multiple connection attempts at the same time. This works fine with uvicorn
in development mode but not with gunicorn
; I suspect it's a limitation of the UvicornWorker
for gunicorn
. Following this I added a timeout
to the wsgi
configuration.With these changes the Zora target page works for me in all cases (uvicorn
with and without db_reset
, and the same with gunicorn
in production).
But this is looking like a serious hack at some point. In a different issue we should consider alternatives, but I'll write a couple ideas here for now:
peewee
if it doesn't play well with FastAPI and async and replace it with SQLAlchemy 2 (or maybe SQLModel if we want to stay in the ecosystem). This may be less of an issue now that catalogdb
is quite stable and maybe we don't need reflection.peewee
we should consider a persistent DB connection (like db_reset=False
does). I think that would be ok since peewee
reconnects if the connection has been lost. We could also have the DB connect to the server during the app initialisation so that at that point in which a request is processed there are no more issues with multiple connections because there is a single instance of the connection.The tests are failing right now but I think that will be fixed by merging #36 first.
Sounds good, let's merge this for now and look at long-term solutions.
Attempt at fixing #32. The problem seems to be related to the
ContextVar
for_state
and the fact that the previous change removed the fastapi dependency.