Open CharlesNepote opened 1 year ago
Static JSON files would be 100 times faster, but for products, we are already "fast enough".
e.g. on my local laptop:
(not a perfect test: only 1 product (but a big one: Nutella), so there will be some file cache by the system probably, and test is run on the same machine)
~/openfoodfacts-server$ ab -n 1000 -c 1000 http://world.openfoodfacts.localhost/api/v3/product/3017620422003
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking world.openfoodfacts.localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Server Software: nginx/1.24.0
Server Hostname: world.openfoodfacts.localhost
Server Port: 80
Document Path: /api/v3/product/3017620422003
Document Length: 113989 bytes
Concurrency Level: 1000
Time taken for tests: 17.242 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 114585000 bytes
HTML transferred: 113989000 bytes
Requests per second: 58.00 [#/sec] (mean)
Time per request: 17242.084 [ms] (mean)
Time per request: 17.242 [ms] (mean, across all concurrent requests)
Transfer rate: 6489.90 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 21 3.1 20 26
Processing: 139 9030 4917.7 9130 17025
Waiting: 102 8860 4855.1 8967 17005
Total: 165 9050 4916.3 9153 17041
Percentage of the requests served within a certain time (ms)
50% 9153
66% 11839
75% 13411
80% 14241
90% 15843
95% 16662
98% 16931
99% 16985
100% 17041 (longest request)
This load test shows that your laptop can sustain less than 60 queries per second on this product.
Compare with nginx serving the same json with a static file...
root@prox2:/etc/nginx/conf.d# ab -n 1000 -c 1000 http://localhost/3017620422003
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Server Software: nginx
Server Hostname: localhost
Server Port: 80
Document Path: /3017620422003
Document Length: 178 bytes
Concurrency Level: 1000
Time taken for tests: 0.124 seconds
Complete requests: 1000
Failed requests: 0
Non-2xx responses: 1000
Total transferred: 336000 bytes
HTML transferred: 178000 bytes
Requests per second: 8061.46 [#/sec] (mean)
Time per request: 124.047 [ms] (mean)
Time per request: 0.124 [ms] (mean, across all concurrent requests)
Transfer rate: 2645.17 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 39 5.1 40 49
Processing: 32 46 8.1 47 63
Waiting: 0 38 8.4 35 59
Total: 50 86 8.5 85 108
Percentage of the requests served within a certain time (ms)
50% 85
66% 89
75% 91
80% 93
90% 97
95% 102
98% 106
99% 107
100% 108 (longest request)
8061 / 58 = 139 times faster... and I guess CPU and I/O costs are also much much lower.
Test done in a 4 cores container.
For the record, I proposed this very idea as well and I can't remember the reason not to move ahead, so +1 from me. That said, this means:
Personally I'm far from convinced.
This issue has been open 90 days with no activity. Can you give it a little love by linking it to a parent issue, adding relevant labels and projets, creating a mockup if applicable, adding code pointers from https://github.com/openfoodfacts/openfoodfacts-server/blob/main/.github/labeler.yml, giving it a priority, editing the original issue to have a more comprehensive description… Thank you very much for your contribution to 🍊 Open Food Facts
For products data
Thanks to this query, we can see that 300 000 products represent 75%+ of the 2,000,000+ scans in 2022 (
unique_scans_n
). So generating static JSON for these products would lead to a huge improvement in terms of performance.Product Opener could generate static files either in real time for the most scanned products, either in the background when more resources are available. The JSON files could be generated only for world.openfoodfacts.org (and maybe few languages) – as JSON queried from different domains (fr, it, de...) only differs for few data (images links). We could configure nginx to: try static JSON file at first and fail back to dynamic data if it does not exist.
For facets
The combination of facets leads to millions (or even billions) of possibilities, but Pareto principle might also be accurate for facets: probably few facets are driving 80% of the queries. These queries could be identified and static JSON could also be generated (once a day?) and served.
Further thoughts
Some web framework can work 100+ time faster than Apache+Perl for simple operations. When people only want few fields from the API (
/3378678687676.json&fields=name,url
) it could be also possible to extract it directly from the JSON file. Found in this famous web framework benchmark, some web framework such as just-js or php-nginx seem well suited and easy to deploy for this case (I only selected languages we can manage for such a simple case; JS, Python, Perl, PHP).