jackc / pgx

PostgreSQL driver and toolkit for Go
MIT License
10.86k stars 846 forks source link

Connecting via SSH fails to resolve host #1724

Open madisonchamberlain opened 1 year ago

madisonchamberlain commented 1 year ago

Describe the bug When connecting to redshift with the pgx driver, if we use a ssh tunnel, but specify the hostname of the connection as a private DNS name (resolvable on ssh tunnel host, but not resolvable on the wider internet), the connection attempts will fail with lookup ****.*****.private (host) on 10.47.240.10:53: (ssh host + port) no such host. This is not the case if we use the standard dbSql library to connect.

To Reproduce

Steps to reproduce the behavior: create a route 53 hosted zone (e.g. name.private), for some vpc-123abc, and add a host record pointing to a redshift IP, e.g. redshift.name.private → 54.151.2.2

spin up an ec2 host in the above vpc, validate you can resolve redshift.name.private.

Use these credentials to connect to redshift over pgx via connectionConfig like this

type tunnel struct {
    config *pgsshConfig
    client *ssh.Client
}

func() connectionFunction() {
    sshPort := uint16(22)
    if config.SSHTunnel.Port != 0 {
        sshPort = config.SSHTunnel.Port
    }
    tunnelHost := fmt.Sprintf("%s:%s", config.SSHTunnel.Host, fmt.Sprint(sshPort))
    tunnelConfig := ssh.ClientConfig{
        User:            utils.SSHTunnelUser(),
        Auth:            getAuthMethods(),
        HostKeyCallback: knownhosts.CertChecker.HostKeyFallback,
        Timeout:         time.Duration(config.MaxTimeoutSecs * int64(time.Second)),
    }
    tunnel := tunnel{config: &pgsshConfig{tunnelHost, tunnelConfig, mkPgxConnStr(config, tz)}}

    pgxCfg, err := pgxpool.ParseConfig(mkPgxConnStr(config, tz))
    if err != nil {
        return nil, err
    }

    sshcon, err := ssh.Dial("tcp", tunnel.config.tunnelHost, &tunnel.config.tunnelConfig)
    if err != nil {     
               return nil, err
    }

    pgxCfg.ConnConfig.DialFunc = func(ctx context.Context, network, addr string) (net.Conn, error) {
        conn, err := sshcon.Dial(network, addr)
        return conn, err
    }

    pool, err := pgxpool.ConnectConfig(ctx, pgxCfg)
        if err != nil {
            // This is where we see the error 
        }
   ...
}

Expected behavior The connection should work without throwing any errors

Actual behavior No such host error

Version

Additional context From looking at the lookupHost code in lookup_unix.go, it looks like the issue is that pgx tries to resolve the hostname of the server well before it tries to dial ssh which wont work if a caller would only be able to resolve the IP on the ssh host.

Related: https://github.com/jackc/pgx/issues/1661

Thank you!

jackc commented 1 year ago

From looking at the lookupHost code in lookup_unix.go, it looks like the issue is that pgx tries to resolve the hostname of the server well before it tries to dial ssh which wont work if a caller would only be able to resolve the IP on the ssh host.

That would do it. pgx doesn't actually know anything about the SSH connection. It is simply using a custom DialFunc. I believe the solution would be to use a custom LookupFunc in addition to the custom DialFunc. LookupFunc allows custom DNS resolution like DialFunc allows custom dialing. That LookupFunc could do the DNS resolution through the existing SSH connection (not exactly sure how, but it should be doable).

wolfgang42 commented 8 months ago

That custom LookupFunc should look like this:

pgxCfg.ConnConfig.LookupFunc = func (ctx context.Context, host string) (addrs []string, err error) {
    return []string{host}, nil
}

That is, just pass through host unchanged. This bypasses the unwanted in-process DNS resolution and passes the original hostname on to the DialFunc; sshcon.Dial() will in turn send it over to the SSH server, which will finally do the resolution on its end, thus producing the desired result.

(SSH doesn’t have a straightforward way to directly request the results of a DNS lookup. This approach does lose pgxconn’s native fallback behavior if the hostname has multiple A records associated with it. The least convoluted fix for this if it’s a feature you need is probably to run pgbouncer on the other end of the tunnel, and connect to that.)