openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
http://openfoodfacts.github.io/openfoodfacts-server/
GNU Affero General Public License v3.0
660 stars 389 forks source link

Generate and serve static JSON files to improve performance? #8934

Open CharlesNepote opened 1 year ago

CharlesNepote commented 1 year ago

For products data

Thanks to this query, we can see that 300 000 products represent 75%+ of the 2,000,000+ scans in 2022 (unique_scans_n). So generating static JSON for these products would lead to a huge improvement in terms of performance.

Product Opener could generate static files either in real time for the most scanned products, either in the background when more resources are available. The JSON files could be generated only for world.openfoodfacts.org (and maybe few languages) – as JSON queried from different domains (fr, it, de...) only differs for few data (images links). We could configure nginx to: try static JSON file at first and fail back to dynamic data if it does not exist.

For facets

The combination of facets leads to millions (or even billions) of possibilities, but Pareto principle might also be accurate for facets: probably few facets are driving 80% of the queries. These queries could be identified and static JSON could also be generated (once a day?) and served.

Further thoughts

Some web framework can work 100+ time faster than Apache+Perl for simple operations. When people only want few fields from the API (/3378678687676.json&fields=name,url) it could be also possible to extract it directly from the JSON file. Found in this famous web framework benchmark, some web framework such as just-js or php-nginx seem well suited and easy to deploy for this case (I only selected languages we can manage for such a simple case; JS, Python, Perl, PHP).

stephanegigandet commented 1 year ago

Static JSON files would be 100 times faster, but for products, we are already "fast enough".

e.g. on my local laptop:

(not a perfect test: only 1 product (but a big one: Nutella), so there will be some file cache by the system probably, and test is run on the same machine)

~/openfoodfacts-server$ ab -n 1000 -c 1000 http://world.openfoodfacts.localhost/api/v3/product/3017620422003
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking world.openfoodfacts.localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests

Server Software:        nginx/1.24.0
Server Hostname:        world.openfoodfacts.localhost
Server Port:            80

Document Path:          /api/v3/product/3017620422003
Document Length:        113989 bytes

Concurrency Level:      1000
Time taken for tests:   17.242 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      114585000 bytes
HTML transferred:       113989000 bytes
Requests per second:    58.00 [#/sec] (mean)
Time per request:       17242.084 [ms] (mean)
Time per request:       17.242 [ms] (mean, across all concurrent requests)
Transfer rate:          6489.90 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   21   3.1     20      26
Processing:   139 9030 4917.7   9130   17025
Waiting:      102 8860 4855.1   8967   17005
Total:        165 9050 4916.3   9153   17041

Percentage of the requests served within a certain time (ms)
  50%   9153
  66%  11839
  75%  13411
  80%  14241
  90%  15843
  95%  16662
  98%  16931
  99%  16985
 100%  17041 (longest request)
cquest commented 1 year ago

This load test shows that your laptop can sustain less than 60 queries per second on this product.

Compare with nginx serving the same json with a static file...

root@prox2:/etc/nginx/conf.d# ab -n 1000 -c 1000 http://localhost/3017620422003
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests

Server Software:        nginx
Server Hostname:        localhost
Server Port:            80

Document Path:          /3017620422003
Document Length:        178 bytes

Concurrency Level:      1000
Time taken for tests:   0.124 seconds
Complete requests:      1000
Failed requests:        0
Non-2xx responses:      1000
Total transferred:      336000 bytes
HTML transferred:       178000 bytes
Requests per second:    8061.46 [#/sec] (mean)
Time per request:       124.047 [ms] (mean)
Time per request:       0.124 [ms] (mean, across all concurrent requests)
Transfer rate:          2645.17 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   39   5.1     40      49
Processing:    32   46   8.1     47      63
Waiting:        0   38   8.4     35      59
Total:         50   86   8.5     85     108

Percentage of the requests served within a certain time (ms)
  50%     85
  66%     89
  75%     91
  80%     93
  90%     97
  95%    102
  98%    106
  99%    107
 100%    108 (longest request)

8061 / 58 = 139 times faster... and I guess CPU and I/O costs are also much much lower.

Test done in a 4 cores container.

teolemon commented 1 year ago

For the record, I proposed this very idea as well and I can't remember the reason not to move ahead, so +1 from me. That said, this means:

alexgarel commented 1 year ago

Personally I'm far from convinced.

github-actions[bot] commented 11 months ago

This issue has been open 90 days with no activity. Can you give it a little love by linking it to a parent issue, adding relevant labels and projets, creating a mockup if applicable, adding code pointers from https://github.com/openfoodfacts/openfoodfacts-server/blob/main/.github/labeler.yml, giving it a priority, editing the original issue to have a more comprehensive description… Thank you very much for your contribution to 🍊 Open Food Facts