kubeedge / edgemesh

Simplified network and services for edge applications
https://edgemesh.netlify.app/
Apache License 2.0
255 stars 128 forks source link

EdgeMesh support retries when call microservice #237

Open ZBoIsHere opened 2 years ago

ZBoIsHere commented 2 years ago

What would you like to be added/modified: EdgeMesh support the configuration of the maximum retries times when call microservice Why is this needed: When accessing microservices, the request often fails for some reasons, so it needs to be retried. In microservice governance, the logic of retry should be completed by the microservice governance framework, namely EdgeMesh, so EdgeMesh is required to configure the microservice Maximum number of failed attempts

ZBoIsHere commented 2 years ago

@Poorunga 微服务访问失败重试的能力,有计划提供支持吗?

Poorunga commented 2 years ago

@ZBoIsHere 微服务在源端建立连接的失败重试机制: https://github.com/kubeedge/edgemesh/blob/main/third_party/forked/kubernetes/pkg/proxy/userspace/proxysocket.go#L91-L118

func TryConnectEndpoints(service proxy.ServicePortName, srcAddr net.Addr, tcpConn *net.TCPConn, protocol string, loadBalancer LoadBalancer) (out io.ReadWriteCloser, err error) {
    sessionAffinityReset := false
    for _, dialTimeout := range EndpointDialTimeouts {
        endpoint, req, err := loadBalancer.NextEndpoint(service, srcAddr, tcpConn, sessionAffinityReset)
        if err != nil {
            klog.ErrorS(err, "Couldn't find an endpoint for service", "service", service)
            return nil, err
        }
        klog.V(3).InfoS("Mapped service to endpoint", "service", service, "endpoint", endpoint)
        outConn, err := TryDialStream(protocol, endpoint, dialTimeout)
        if err != nil {
            if util.IsTooManyFDsError(err) {
                panic("Dial failed: " + err.Error())
            }
            klog.ErrorS(err, "Dial failed")
            sessionAffinityReset = true
            continue
        }
        if req != nil {
            reqBytes, err := util.HttpRequestToBytes(req)
            if err == nil {
                outConn.Write(reqBytes)
            }
        }
        return outConn, nil
    }
    return nil, fmt.Errorf("failed to connect to an endpoint")
}

微服务在目的端建立代理连接的失败重试机制: https://github.com/kubeedge/edgemesh/blob/main/agent/pkg/tunnel/proxy/proxy.go#L94-L125

func (ps *ProxyService) TryConnectEndpoint(msg *pb.Proxy) (net.Conn, error) {
    var err error
    switch msg.GetProtocol() {
    case "tcp":
        for i := 0; i < MaxRetryTime; i++ {
            tcpConn, err := net.DialTCP("tcp", nil, &net.TCPAddr{
                IP:   net.ParseIP(msg.GetIp()),
                Port: int(msg.GetPort()),
            })
            if err == nil {
                return tcpConn, nil
            }
            time.Sleep(time.Second)
        }
        klog.Errorf("max retries for dial")
        return nil, err
    case "udp":
        for i := 0; i < MaxRetryTime; i++ {
            udpConn, err := net.DialUDP("udp", nil, &net.UDPAddr{
                IP:   net.ParseIP(msg.GetIp()),
                Port: int(msg.GetPort()),
            })
            if err == nil {
                return udpConn, nil
            }
        }
        klog.Errorf("max retries for dial")
        return nil, err
    default:
        return nil, fmt.Errorf("unsupported protocol: %s", msg.GetProtocol())
    }
}

每次拨号失败,都会休眠几秒,再重新拨号,超过最大重试次数后失败。这是否是您想要的微服务访问失败重试的能力呢?