Open raphael0202 opened 3 months ago
We could potentially start storing the full Product JSON in Postgres. I did a POC on this a while ago (https://github.com/openfoodfacts/openfoodfacts-server/issues/8620). The main issue is the additional database space but if that is OK then having the data in a relational database would make it much easier to use different languages than Perl.
Good idea! Could we try to compare other solutions? Note I don't have a clear opinion on what's the best solution. I have tried to be objective for both solutions, but maybe I don't have sufficient knowledge to do so.
Don't hesitate to edit this table.
Static JSON + nginx | Async server | Comments | |
---|---|---|---|
RAM | Winner? (but what order?) | - | nginx is known to be very efficient on that side; but FastAPI + PostGresql seems to consume few RAM for folksonomy engine (with very low traffic, that said) |
Disk usage | 300k products x 100KB? = 30 MB | Winner | The difference is not so big, does it really matter? All these data could be in nginx cache |
Performance | Clear winner (x100?) | - | Isn't it the main issue we're facing? |
Products' perimeter | 300K products | All products | 300k products represent 75% of all requests; probably more than 1 million products are never called with the API |
Functional's perimeter | What about translations? | Clear winner | This needs to be evaluated. I don't understand the impacts. This might be the clear or even mandatory bonus for the async server. How big would be a JSON with all the translations? |
Implementation | Few days? | ? | |
Complexity | Winner: no new services | Needs to code or deploy (and maintain) a new server | |
Maintenance | ? | ? | Any idea? Not sure, but intuitively, maintaining a new server is more costly |
Scalability | Better/easier scalability thanks to nginx | Scalability needs more code | I would say JSON + nginx is a clear winner but needs to be confirmed. Eg. couldn't JSON files be stored on another server like images? |
Resilience | Better/easier fallbacks thanks to nginx | Resilience needs more code | Idem |
Sustainability | A bit more technical debt in Perl | More technical debt, but in a more widespread language |
The performance issues we're currently experiencing led us to analyze what requests are taking most processing time on Apache server: https://docs.google.com/document/d/13rYXR0TxR2hUc0XEKzKcBT6ndcd5_L3yeP_L6UjZwzs/edit. The analysis revealed that facet-related queries were the most costly.
We only have 50 Apache workers, so when most workers are busy waiting for MongoDB or off-query, we can't respond to basic
GET /api/v*/products/{code}
queries that only require a disk access (to fetch the sto file) and a bit of RAM to get the translations. These requests account for 15% of all requests handled by Product Opener. This route is the most-used API endpoint by our own mobile app and reusers.My proposal would be to use a new asynchronous service (written with FastAPI in Python, for example), to handle read-only
GET /api/v*/products/{code}
requests.Having a distinct service that takes care of read-only API queries would make sure that our own app (or third-party apps) won't fail even if ProductOpener does. Asynchronicity means that:
The addition of knowledge panels could also be migrated to this new service later.
I think it's a better alternative than #8934 that, while being faster (served directly by nginx), is more disk-hungry, won't be available on all products and doesn't play nicely with taxonomized fields translations.
This could also be a first step to tackle #5170. Write queries are not very common (0.25% of queries handle by Product Opener), and most of the complexity of the codebase comes from data processing/score computation associated with write queries.
That's why I think it's better to keep POST queries out of the scope of this proposal for now.
Limits
This service wouldn't account for the 53% of queries that are product HTML pages. Serving these pages through this async service would be much more difficult, as it would mean to migrate all the HTML logic there.