temporalio / temporal

Temporal service
https://docs.temporal.io
MIT License
11.94k stars 843 forks source link

Unable to use dns:// prefix for clusterInformation.rpcAddress configuration #5979

Open hferentschik opened 5 months ago

hferentschik commented 5 months ago

Expected Behavior

We are trying to setup multi-cluster replication and for that purpose need to configure the clusterMetadata of the Temporal configuration. This is described here. Specific to the rpcAddress the docs says:

rpcAddress - indicate the remote service address (host:port). Host can be DNS name. Use dns:/// prefix to enable round-robin between IP address for DNS name.

We are using this prefix everywhere where we create a gRPC client. In particular our workers connect via dns:///<service-address>:7233. The reason to explicitly specify the dns resolver is that grpc-go deviates from the gRPC spec and does not use the dns resolver as default. See https://github.com/grpc/grpc-go/issues/3544#issuecomment-618689886. It uses a so called "passthrough" as default. If one wants to use dns, one needs to explicitly specify the resolver using dns:///.

It seems in most cases Temporal allows the use of the prefix, except for clusterInformation.rpcAddress.

Actual Behavior

When configuring the clusterInformation.rpcAddress with something like dns:///<host>:7233 an error is returned when one tried to use cluster replication commands, for example temporal operator cluster upsert in order to establish a connection between two clusters. The returned error is:

address dns:///myhost:7233: too many colons in address

See also https://go.dev/play/p/9Rwow_gtcCF

The problem is that RPCFactory#CreateRemoteFrontendGRPCConnection uses net.SplitHostPort to extract the hostname from the configured rpcAddress in order to lookup the client config for the host. net.SplitHostPort cannot handle the dns:/// prefix.

func (d *RPCFactory) CreateRemoteFrontendGRPCConnection(rpcAddress string) *grpc.ClientConn {
    var tlsClientConfig *tls.Config
    var err error
    if d.tlsFactory != nil {
        hostname, _, err2 := net.SplitHostPort(rpcAddress)
        if err2 != nil {
            d.logger.Fatal("Invalid rpcAddress for remote cluster", tag.Error(err2))
        }
        tlsClientConfig, err = d.tlsFactory.GetRemoteClusterClientConfig(hostname)

        if err != nil {
            d.logger.Fatal("Failed to create tls config for gRPC connection", tag.Error(err))
            return nil
        }
    }

    return d.dial(rpcAddress, tlsClientConfig)
}

Steps to Reproduce the Problem

  1. Use the dns:/// prefix in the clusterInformation.rpcAddress of the Temporal server in order to enforce the dns resolver in grpc-go
  2. Use an operation which calls CreateRemoteFrontendGRPCConnection, eg try enabling a cluster connection between two clusters.

Specifications

hferentschik commented 5 months ago

Something like this https://go.dev/play/p/403gj5Zovy7 could work (assuming the port is mandatory):

package main

import (
    "fmt"
    "net"
    "net/url"
    "strings"
)

func main() {
    host, err := hostname("myhost:7233")
    fmt.Printf("host: %s, error: %v\n", host, err)

    host, err = hostname("dns:///myhost:7233")
    fmt.Printf("host: %s, error: %v\n", host, err)
}

func hostname(rpcAddress string) (string, error) {
    hostname, _, err := net.SplitHostPort(rpcAddress)
    if err != nil {
        grpcUrl, parseErr := url.Parse(rpcAddress)
        if parseErr != nil {
            return "", parseErr
        }

        // Check the host field, if set rpcAddress is in plain host:port format
        // If the host is not set the address specified a resolver, eg dns:///host:port.
        // In this case the address is in the path of the URL.
        // See https://github.com/grpc/grpc-go/blob/c8083227ee3181db60cf37005940e11865a572b4/resolver/resolver.go#L272-L288
        address := grpcUrl.Host
        if address == "" {
            address = grpcUrl.Path
        }
        if address == "" {
            address = grpcUrl.Opaque
        }

        address = strings.TrimPrefix(address, "/")
        hostname, _, _ = net.SplitHostPort(address)
        err = nil

    }
    return hostname, err
}

Alternatively one could just use strings.CutPrefix to remove dns:///, but then that would only cover the single dns resolver.

Happy to create/work on a PR if we can agree on the approach. Or maybe I am still missing something here?

hferentschik commented 5 months ago

@yux0 wdyt? Did you have a chance to look at this?

yux0 commented 1 month ago

This is merged. https://github.com/temporalio/temporal/pull/6430. The newClient should use dns by default.