open-contracting / credere-backend

A tool that facilitates the participation of Micro, Small, and Medium businesses (MSMEs) in the Colombian public procurement market.
https://credere.readthedocs.io
BSD 3-Clause "New" or "Revised" License
5 stars 0 forks source link

Performance: Run update_statistics periodically, instead of as background task #322

Closed jpmckinney closed 1 month ago

jpmckinney commented 2 months ago

If we start getting many requests to endpoints that run update_statistics in the background, we could end up running it a lot unnecessarily (since I think it is just for internal use?). It might be simpler to run the cron job every 5 or 15 mins.

We did witness some "slow query" Sentry errors from this background task. #308

yolile commented 2 months ago

It is called by many endpoints indeed:

FIs:

Borrowers:

I will remove it from everywhere and update the cron job frequency instead

yolile commented 2 months ago

Cron updated https://github.com/open-contracting/deploy/commit/a5d0dbcdfc4aa359f73f15b71069c5212d2f4bec

yolile commented 2 months ago

Actually, I think I will instead implement a "refresh" button in the admin (OCP and FI) dashboard instead, and remove the cron job completely

jpmckinney commented 2 months ago

Sure, we can start with that. We can also look into having a TTL cache, so that admins don’t need to refresh every time. https://cachetools.readthedocs.io/en/latest/#cachetools.TTLCache

edit: Actually, not sure if a Python cache would share memory between all requests. We can put memcache in docker compose and use this: https://github.com/long2ice/fastapi-cache

yolile commented 2 months ago

Or, should we call update_statistics every time the admin page is loaded?

jpmckinney commented 2 months ago

We can. The idea with memcache is that we can either cache the response for a fixed time and/or we can invalidate the cache when an operation occurs. Compared to background tasks, we would only recalculate statistics when requested (and nothing in cache)

yolile commented 2 months ago

I think we can start without the cache and monitor how it goes. I will also review the queries in that function and double-check we need all the statistics we are calculating.

jpmckinney commented 2 months ago

Sounds good!

yolile commented 2 months ago

https://github.com/open-contracting/credere-backend/pull/333#issuecomment-2287021105

jpmckinney commented 2 months ago

Copying over that comment for easier reading:

I think we should also change https://github.com/open-contracting/credere-backend/blob/main/app/routers/statistics.py to not check the database and calculate and return the statistics instead

And then we could delete the update_statistics commands and Statistics table...

To which I noted:

The table does keep a history of statistics as of each date – not sure if this was a requirement.

yolile commented 2 months ago

The table does keep a history of statistics as of each date – not sure if this was a requirement.

I couldn't find that one as a requirement. I think with all the date fields we have at application we could get historical statistics without having to store them in a table.

jpmckinney commented 2 months ago

Fine with me!