wi1dcard / v2ray-exporter

🧭 Prometheus exporter for V2Ray and V2Fly metrics, with a simple Grafana dashboard.
MIT License
116 stars 29 forks source link

[Bug] exporter causes too many TIME_WAIT with a scrape interval of 5s #12

Closed imryao closed 2 years ago

imryao commented 2 years ago

Hi @wi1dcard ! I really like this project, but I've found something unexpected recently. I'm using Prometheus with a scrape_interval of 5s. When I check the logs of v2ray, I found many logs like this:

2021/12/19 11:26:03 [Info] [109249963] proxy/dokodemo: received request for 172.31.0.4:49388
2021/12/19 11:26:03 [Info] [109249963] app/dispatcher: taking detour [api] for [tcp:127.0.0.1:0]
2021/12/19 11:26:03 [Info] [109249963] app/proxyman/inbound: connection ends > proxy/dokodemo: connection ends > proxy/dokodemo: failed to transport request > read tcp 172.31.0.3:8888->172.31.0.4:49388: read: connection reset by peer
2021/12/19 11:26:08 [Info] [1714658193] proxy/dokodemo: received request for 172.31.0.4:49390
2021/12/19 11:26:08 [Info] [1714658193] app/dispatcher: taking detour [api] for [tcp:127.0.0.1:0]
2021/12/19 11:26:08 [Info] [1714658193] app/proxyman/inbound: connection ends > proxy/dokodemo: connection ends > proxy/dokodemo: failed to transport request > read tcp 172.31.0.3:8888->172.31.0.4:49390: read: connection reset by peer
2021/12/19 11:26:13 [Info] [3724667099] proxy/dokodemo: received request for 172.31.0.4:49392
2021/12/19 11:26:13 [Info] [3724667099] app/dispatcher: taking detour [api] for [tcp:127.0.0.1:0]
2021/12/19 11:26:13 [Info] [3724667099] app/proxyman/inbound: connection ends > proxy/dokodemo: connection ends > proxy/dokodemo: failed to transport request > read tcp 172.31.0.3:8888->172.31.0.4:49392: read: connection reset by peer

while 172.31.0.3 refers to v2ray, 172.31.0.4 refers to exporter.

Then I check netstat of exporter:

Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 172.31.0.4:49562        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49582        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49584        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49570        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49566        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49578        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49574        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49564        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49580        172.31.0.3:8888         TIME_WAIT  
tcp6       0      0 172.31.0.4:9550         172.31.0.2:51808        ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node   Path

while 172.31.0.2 refers to reverse proxy.

As you can see, there are about 10 connections in TIME_WAIT.

Then I check the source code, I found the exporter will create a new connection every time it receives a scrape request:

func (e *Exporter) scrapeV2Ray(ch chan<- prometheus.Metric) error {
    ctx, cancel := context.WithTimeout(context.Background(), e.scrapeTimeout)
    defer cancel()

    conn, err := grpc.DialContext(ctx, e.endpoint, grpc.WithInsecure(), grpc.WithBlock())
    if err != nil {
        return fmt.Errorf("failed to dial: %w, timeout: %v", err, e.scrapeTimeout)
    }
    defer conn.Close()

    client := command.NewStatsServiceClient(conn)

    if err := e.scrapeV2RaySysMetrics(ctx, ch, client); err != nil {
        return err
    }

    if err := e.scrapeV2RayMetrics(ctx, ch, client); err != nil {
        return err
    }

    return nil
}

I think this can lead to TIME_WAIT mentioned above. I wonder if we can reuse the grpc connection to reduce the number of client creating.

Looking forward to your reply!

imryao commented 2 years ago

Hi @wi1dcard ! I've made some bugfix here.

13

Looking forward to your comments!