How to Optimize the Performance of My fasthttp Client in a Production Environment

In my production environment, I use fasthttp to make requests to third-party services. During peak traffic times, the fasthttp client experiences some latency, with some delays possibly exceeding several seconds. To investigate, I conducted a stress test and discovered that as the number of connections increases, latency issues arise.

Fasthttp version: v1.55.0

Pressure test environment

      Model Name: MacBook Pro
      Model Identifier: MacBookPro18,3
      Model Number: MKGP3CH/A
      Chip: Apple M1 Pro
      Total Number of Cores: 8 (6 performance and 2 efficiency)
      Memory: 16 GB

Simulating a Third-Party Service with Code:

package main

import (
    "log"
    "time"
    "github.com/valyala/fasthttp"
)

var (
    strContentType = []byte("Content-Type")
    strApplication = []byte("application/json")
    body           = []byte("{\"message\": \"Hello, world!\"}")
)

func main() {
    go func() {
        if err := fasthttp.ListenAndServe("localhost:7001", nil); err != nil {
            log.Fatalf("Error in ListenAndServe: %v", err)
        }
    }()

    if err := fasthttp.ListenAndServe("localhost:8001", handler); err != nil {
        log.Fatalf("Error in ListenAndServe: %v", err)
    }
}

func handler(ctx *fasthttp.RequestCtx) {
    begin := time.Now()

    // handle request
    {
        ctx.Response.Header.SetCanonical(strContentType, strApplication)
        ctx.Response.SetStatusCode(fasthttp.StatusOK)
        ctx.Response.SetBody(body)
    }

    log.Printf("%v | %s %s %v %v",
        ctx.RemoteAddr(),
        ctx.Method(),
        ctx.RequestURI(),
        ctx.Response.Header.StatusCode(),
        time.Since(begin),
    )
}

Code Snippet for Simulating Third-Party Service Calls

package main

import (
    "log"
    "net/http"
    _ "net/http/pprof"
    "time"

    "github.com/valyala/fasthttp"
)

var (
    client *fasthttp.HostClient
)

const (
    readTimeout  = 3 * time.Second
    writeTimeout = 3 * time.Second

    maxConnsPerHost     = 2048
    maxIdleConnDuration = 3 * time.Minute
)

func main() {
    client = &fasthttp.HostClient{
        Addr:                          "localhost:8001",
        MaxConns:                      maxConnsPerHost,
        ReadTimeout:                   readTimeout,
        WriteTimeout:                  writeTimeout,
        MaxIdleConnDuration:           maxIdleConnDuration,
        NoDefaultUserAgentHeader:      true,
        DisableHeaderNamesNormalizing: true,
        DisablePathNormalizing:        true,
        MaxIdemponentCallAttempts:     1,
    }

    go func() {
        if err := http.ListenAndServe("localhost:7002", nil); err != nil {
            log.Fatalf("Error in ListenAndServe: %v", err)
        }
    }()

    if err := fasthttp.ListenAndServe("localhost:8002", handler); err != nil {
        log.Fatalf("Error in ListenAndServe: %v", err)
    }
}

func api(ctx *fasthttp.RequestCtx) error {
    begin := time.Now()
    defer func() {
        log.Printf("%v | %s %s %v %d",
            ctx.RemoteAddr(),
            ctx.Method(),
            ctx.RequestURI(),
            time.Since(begin),
            client.ConnsCount(),
        )
    }()

    req := fasthttp.AcquireRequest()
    defer fasthttp.ReleaseRequest(req)

    req.SetRequestURI("http://localhost:8001")
    req.Header.SetMethod(fasthttp.MethodGet)

    resp := fasthttp.AcquireResponse()
    defer fasthttp.ReleaseResponse(resp)

    return client.Do(req, resp)
}

func handler(ctx *fasthttp.RequestCtx) {
    if err := api(ctx); err != nil {
        ctx.SetStatusCode(fasthttp.StatusInternalServerError)
    } else {
        ctx.SetStatusCode(fasthttp.StatusOK)
    }
}

Results Obtained Using the Load Testing Tool:

1 connection:

➜  ~ wrk -t1 -c1 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   160.03us  802.36us  14.51ms   97.87%
    Req/Sec    16.41k     2.30k   18.29k    90.10%
  Latency Distribution
     50%   52.00us
     75%   65.00us
     90%   90.00us
     99%    4.04ms
  164890 requests in 10.10s, 14.62MB read
Requests/sec:  16326.54
Transfer/sec:      1.45MB

10 connections

➜  ~ wrk -t1 -c10 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   622.15us    2.21ms  43.48ms   97.30%
    Req/Sec    30.30k     4.38k   39.26k    74.00%
  Latency Distribution
     50%  279.00us
     75%  427.00us
     90%  611.00us
     99%   10.97ms
  301272 requests in 10.00s, 26.72MB read
Requests/sec:  30121.96
Transfer/sec:      2.67MB

50 connections

➜  ~ wrk -t1 -c50 -d10s http://localhost:8002 --latency 
Running 10s test @ http://localhost:8002
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.69ms    1.95ms  36.35ms   96.91%
    Req/Sec    32.71k     4.71k   42.19k    73.00%
  Latency Distribution
     50%    1.46ms
     75%    1.83ms
     90%    2.28ms
     99%   11.05ms
  325559 requests in 10.01s, 28.87MB read
Requests/sec:  32526.90
Transfer/sec:      2.88MB

100 connections

➜  ~ wrk -t1 -c100 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.70ms    4.27ms  78.96ms   96.63%
    Req/Sec    30.67k     5.77k   43.51k    76.00%
  Latency Distribution
     50%    3.08ms
     75%    3.88ms
     90%    4.82ms
     99%   26.20ms
  305183 requests in 10.01s, 27.07MB read
Requests/sec:  30499.69
Transfer/sec:      2.71MB

500 connections

➜  ~ wrk -t1 -c500 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    14.75ms    5.69ms  78.57ms   85.19%
    Req/Sec    34.38k     5.13k   46.05k    72.00%
  Latency Distribution
     50%   14.21ms
     75%   17.06ms
     90%   19.83ms
     99%   39.96ms
  342024 requests in 10.02s, 30.33MB read
  Socket errors: connect 0, read 637, write 0, timeout 0
Requests/sec:  34131.79
Transfer/sec:      3.03MB

1000 connections

➜  ~ wrk -t1 -c1000 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    30.61ms   10.12ms 110.69ms   77.67%
    Req/Sec    32.04k     7.53k   47.23k    76.00%
  Latency Distribution
     50%   29.75ms
     75%   35.21ms
     90%   41.99ms
     99%   68.50ms
  318908 requests in 10.03s, 28.28MB read
  Socket errors: connect 0, read 3541, write 0, timeout 0
Requests/sec:  31807.34
Transfer/sec:      2.82MB

1500 connections

➜  ~ wrk -t1 -c1500 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 1500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    44.64ms   16.50ms 212.65ms   87.08%
    Req/Sec    33.34k     7.91k   48.98k    78.00%
  Latency Distribution
     50%   42.72ms
     75%   49.30ms
     90%   58.31ms
     99%  110.18ms
  332420 requests in 10.09s, 29.48MB read
  Socket errors: connect 0, read 3383, write 469, timeout 0
Requests/sec:  32950.19
Transfer/sec:      2.92MB

2000 connections

➜  ~ wrk -t1 -c2000 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 2000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    59.99ms   29.49ms 411.31ms   92.16%
    Req/Sec    29.86k    13.71k   46.66k    76.04%
  Latency Distribution
     50%   55.47ms
     75%   64.43ms
     90%   74.44ms
     99%  201.06ms
  285246 requests in 10.09s, 25.30MB read
  Socket errors: connect 0, read 16081, write 642, timeout 0
Requests/sec:  28261.07
Transfer/sec:      2.51MB

As the number of connections increases, it leads to higher latency. However, the third-party service still responds quickly; in this example, the response time is measured in microseconds (µs).

1718719592290

I used flame graphs to help with the analysis. It appears that most of the time is spent on system calls, what can I do to reduce response latency in this situation

valyala / fasthttp

How to Optimize the Performance of My fasthttp Client in a Production Environment #1793