protocol / prodeng

Issues, discussions and documentation from the production engineering team
2 stars 1 forks source link

Improve Gateway TTFB #15

Open iand opened 2 years ago

iand commented 2 years ago

TTFB here is defined as the 7 day mean of the 95th centile measurement of the time to first byte as reported by all Gateway NGINX servers

The TTFB encapsulates the time taken for go-ipfs to resolve and read a requested block. The block may be the root of a file dag that must be retrieved to fulfill the entire request, but this additional retrieval time may not be part of the TTFB metric if go-ipfs is able to begin streaming to the client immediately.

As a baseline reference since the start of 2022 of the TTFB metric are as follows (Grafana dashboard):

Timings are 7–day averages (in ms), request counts are 7 day totals

Week ending Overall TTFB P95
7-day avg (ms)
/ipfs TTFB P95
7-day avg (ms)
/ipns TTFB P95
7-day avg (ms)
/ipfs requests
7-day total
/ipns requests
7 day total
2022-07-17 14465 14654 1606 1160789601 8839516
2022-07-10 15117 15398 1876 758965477 7992380
2022-07-03 14624 14847 - 758534722 6503190
2022-06-26 12137 12321 2801 681296119 6196580
2022-06-19 11783 11951 3021 676777682 5475402
2022-06-12 10202 10379 1421 663156465 6108780
2022-06-05 7695 7813 1056 679216764 6145194
2022-05-29 9569 9696 1110 750595367 5885012
2022-05-221 11321 11460 1172 684531043 6207989
2022-05-15 3930 4018 767 645623016 6210211
2022-05-08 5636 5750 805 645383820 5729412
2022-05-01 7467 7605 827 625505173 5691557
2022-04-24 6137 6273 868 674131403 6576324
2022-04-17 5828 6282 1207 579844301 6069346
2022-04-10 11320 11525 1268 611577089 6065253
2022-04-03 6383 6411 32186 490258661 6068431
2022-03-272 9117 8286 327060 530622766 4928387
2022-03-20 7842 7237 509940 564945245 4751772
2022-03-13 4602 4015 - 653778624 4720244
2022-03-06 7155 6585 492720 824065423 4292238
2022-02-27 5855 5779 410400 944414646 2349145
2022-02-20 7819 7728 432060 684044673 2578595
2022-02-13 9888 9637 452280 514643736 2573098
2022-02-06 10176 9934 464280 483058798 2568412
2022-01-30 11684 9459 508200 483914472 6096080
2022-01-23 28373 20599 541320 432975290 8736287

1 More gateway instances were added on May 24 2 Gateway scaled out on Apr 1

Background

The factors that affect TTFB include the following (in descending order of impact):

  1. whether any required block is present in the local blockstore
  2. the response time of the local blockstore to locate and read blocks
  3. the number of intermediate nodes that must be read to locate the CID of the requested path
  4. the time spent locating the provider of a block using the DHT
  5. the time spent connecting to and reading from a block provider (involves Bitswap discovery)
  6. For some types of requests (directory listing), the time spent enumerating the contents of a directory if a directory listing needs to be served
  7. the time taken to validate the existence of the requested block (cache validation/etag check)