hubmapconsortium / entity-api

A set of web service calls to return information about HuBMAP entities
https://entity.api.hubmapconsortium.org
MIT License
3 stars 1 forks source link

Analysis: What is causing 502 errors in entity-api during search indexing? #462

Open shirey opened 1 year ago

yuanzhou commented 1 year ago

Observations from load testing:

When running the load test with a single process, Locust can simulate a reasonably high throughput:

docker run --network gateway_hubmap -v $PWD:/mnt/locust hubmap/api-load-test-image:1.0.0 -f /mnt/locust/locustfile.py --headless -u 5000 -r 50

Note: the below result is based on the POOL_MAX_SIZE = 100 setting.

Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
GET      /                                                                                 36     0(0.00%) |   2703      37   12865    380 |    0.03        0.00
GET      /ancestors/<id>?property=uuid                                                   5048     0(0.00%) |    996      84    2134   1100 |    4.82        0.00
GET      /children/<id>/provenance                                                       5024     0(0.00%) |   1010     133    2412   1100 |    4.80        0.00
GET      /children/<id>?property=uuid                                                    5025     0(0.00%) |    926      59    1986    990 |    4.80        0.00
GET      /collections/<id>                                                                368     0(0.00%) |    804     288    5526    720 |    0.35        0.00
GET      /descendants/<id>?property=uuid                                                 5042     0(0.00%) |    997     183    1988   1100 |    4.81        0.00
GET      /entities/<id>                                                                 40838     1(0.00%) |   1004      71   23530   1100 |   39.00        0.00
GET      /entities/<id>/globus-url                                                      38904     0(0.00%) |    994      59   14473   1000 |   37.15        0.00
GET      /parents/<id>?property=uuid                                                     5026     0(0.00%) |    984      97    2322   1100 |    4.80        0.00
GET      /status                                                                           36     0(0.00%) |    490     167     865    470 |    0.03        0.00
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
         Aggregated                                                                    105347     1(0.00%) |    995      37   23530   1100 |  100.59        0.00

When running the distributed load generation

docker-compose up --scale locust-worker=16

-u 100 -r 20 worked fine, and we started getting erros by using -u 150 -r 20:

entity-api-locust-master-1   | Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s
entity-api-locust-master-1   | --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
entity-api-locust-master-1   | GET      /                                                                                 49     0(0.00%) |    228       4     477    240 |    0.00        0.00
entity-api-locust-master-1   | GET      /ancestors/<id>?property=uuid                                                    904    50(5.53%) |    435       1   32430    390 |   15.30        0.00
entity-api-locust-master-1   | GET      /children/<id>/provenance                                                        876    51(5.82%) |    396       0    3537    400 |   14.30        0.00
entity-api-locust-master-1   | GET      /children/<id>?property=uuid                                                     886    51(5.76%) |    365       0   15589    330 |   14.60        0.00
entity-api-locust-master-1   | GET      /collections/<id>                                                                742     0(0.00%) |    478      22   54105    330 |    0.30        0.00
entity-api-locust-master-1   | GET      /descendants/<id>?property=uuid                                                  902    51(5.65%) |    445       1   32107    390 |   15.40        0.00
entity-api-locust-master-1   | GET      /entities/<id>                                                                  9135 1200(13.14%) |    321       0   60003    310 |  135.20        0.00
entity-api-locust-master-1   | GET      /entities/<id>/globus-url                                                       7845   300(3.82%) |    347       1   35350    310 |  130.90        0.00
entity-api-locust-master-1   | GET      /parents/<id>?property=uuid                                                      892    51(5.72%) |    435       0   39612    380 |   14.80        0.00
entity-api-locust-master-1   | GET      /status                                                                           49     0(0.00%) |    277       8    1416    290 |    0.00        0.00
entity-api-locust-master-1   | --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
entity-api-locust-master-1   |          Aggregated                                                                     22280  1754(7.87%) |    354       0   60003    320 |  340.80        0.00

Nginx error log shows no live upstream:

2023/03/28 18:48:51 [error] 22#22: *4677385 no live upstreams while connecting to upstream, client: 172.18.0.23, server: localhost, request: "GET /children/d731ed756d01838e591f522b46bf160e?property=uuid HTTP/1.1", upstream: "uwsgi://localhost", host: "entity-api:8080"

Nginx access log shows as 499:

172.18.0.10 - - [28/Mar/2023:18:48:52 +0000] "GET /entities/b2db3414cedf8805d20df3cf753842ca/globus-url HTTP/1.1" 499 0 "-" "python-urllib3/1.26.15"