gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.45k stars 1.74k forks source link

Redshift JDBC driver unable to communicate via Teleport. #29417

Open programmerq opened 1 year ago

programmerq commented 1 year ago

I had originally set out to find out if there was a JDBC Driver option that could be adjusted to make the Redshift driver work. I have seen folks bring up how their DBAs would very much rather use the redshift driver when working with redshift via Teleport.

During the course of this, I ran into a fairly low-level bug. The Teleport proxy seems to be sending some PostgreSQL protocol messages in separate TCP packets compared to connecting directly to Redshift. This is triggering an error in the Redshift JDBC driver during connection initialization.

Steps to Reproduce

  1. Launch Teleport proxy listening on port 53189 with tsh proxy db --db-user teleport-redshift-serverless-access --port 53189 --tunnel work2

    Confirm that the tunnel is configured correctly by using a non-redshift client such as the postgres jdbc driver, or the psql command line client.

  2. Connect to proxy on 53189 from JDBC using:

    • URL: jdbc:redshift://127.0.0.1:53189/dev
    • User: teleport-redshift-serverless-access
    • Password: empty ssl disabled. I had to manually disable it in the "Advanced" tab in data grip, in addition to unchecking the box.
    • clusterid=work2,dbuser=teleport-redshift-serverless-access
    • Optional: To get debug logs from the driver, set loglevel=TRACE and logpath=/path/to/some/output/directory/

    I had similar results with both datagrip and dbeaver. I did use a serverless redshift, and I followed this guide to set it up.

  3. Test

    • "Test Connection" fails with the error "Connection was established but closed as invalid." in datagrip with these settings.
    • "Test Connection ..." in datagrip results in "This connection has been closed."

Root Cause Analysis

JDBC Driver logs

The jdbc TRACE level logs showed a large amount of data, including stacktraces. The driver is available on github too: https://github.com/aws/amazon-redshift-jdbc-driver

I was able to enable logging in the drivers. When connecting from datagrip, The log showed that it was raising an I/O error while running the data through its UTF-8 decoder:

jdbc-utf8.txt

Packet Capture

I ran wireshark to capture the packets and look for the illegal byte 133, or 0x85

The hex decoded TCP payload had the following line:

000002BA 44 00 00 00 85 00 01 00 00 00 7b 50 6f 73 74 67 D....... ..{Postg

This was part of the startup message, and the 85 was the integer indicating the message length. It seemed like something about Teleport's postgres server implementation didn't sit well with the redshift driver.

jdbc-utf8.pcapng.txt (remove .txt extension to use in wireshark). tcp.sequence eq 0 is the relevant connection. Use -d tcp.port==53189,pgsql

I compared a packet capture to the same Teleport database, and saw that the postgres jdbc and psql connections to this authenticated proxy were nearly identical. The only difference is that immediately after the startup message is sent, the jdbc client would disconnect and display the error to the user.

Connecting to Redshift Directly

I decided to compare this to the startup message when the redshift driver is directly connecting to redshift itself. I used https://github.com/neykov/extract-tls-secrets to write the tls keylog file so I could decrypt the packets in wireshark/tshark, and it worked.

It is essentially identical, but it does have a subtle difference. looking at the entire startup message, it looks like the startup message packets sent by Teleport are split into multiple packets. Here is the startup message when connecting directly to redshift:

000002AA  31 00 00 00 04 32 00 00  00 04 54 00 00 00 26 00   1....2.. ..T...&.
000002BA  01 76 65 72 73 69 6f 6e  00 00 00 00 00 00 00 00   .version ........
000002CA  00 04 13 ff ff 00 01 00  03 00 01 00 00 00 00 10   ........ ........
000002DA  00 44 00 00 00 85 00 01  00 00 00 7b 50 6f 73 74   .D...... ...{Post
000002EA  67 72 65 53 51 4c 20 38  2e 30 2e 32 20 6f 6e 20   greSQL 8 .0.2 on 
000002FA  69 36 38 36 2d 70 63 2d  6c 69 6e 75 78 2d 67 6e   i686-pc- linux-gn
0000030A  75 2c 20 63 6f 6d 70 69  6c 65 64 20 62 79 20 47   u, compi led by G
0000031A  43 43 20 67 63 63 20 28  47 43 43 29 20 33 2e 34   CC gcc ( GCC) 3.4
0000032A  2e 32 20 32 30 30 34 31  30 31 37 20 28 52 65 64   .2 20041 017 (Red
0000033A  20 48 61 74 20 33 2e 34  2e 32 2d 36 2e 66 63 33    Hat 3.4 .2-6.fc3
0000034A  29 2c 20 52 65 64 73 68  69 66 74 20 31 2e 30 2e   ), Redsh ift 1.0.
0000035A  35 32 39 34 33 00 00 43  00 00 00 0b 53 45 4c 45   52943..C ....SELE
0000036A  43 54 00                                           CT.

redshift-direct-decrypted.pcapng.txt. Use -d tcp.port==5439,pgsql to get the pgsql to populate. The TLS keylog has been incorporated into the capture already.

Here is the same startup message sent via Teleport:

# Packet 1
0000028F  31 00 00 00 04                                     1....

# Packet 2
00000294  32 00 00 00 04                                     2....

# Packet 3
00000299  54 00 00 00 20 00 01 76  65 72 73 69 6f 6e 00 00   T... ..v ersion..
000002A9  00 00 00 00 00 00 00 04  13 ff ff 00 01 00 03 00   ........ ........

# Packet 4
000002B9  01                                                 .

# Packet 5
000002BA  44 00 00 00 85 00 01 00  00 00 7b 50 6f 73 74 67   D....... ..{Postg
000002CA  72 65 53 51 4c 20 38 2e  30 2e 32 20 6f 6e 20 69   reSQL 8. 0.2 on i
000002DA  36 38 36 2d 70 63 2d 6c  69 6e 75 78 2d 67 6e 75   686-pc-l inux-gnu
000002EA  2c 20 63 6f 6d 70 69 6c  65 64 20 62 79 20 47 43   , compil ed by GC
000002FA  43 20 67 63 63 20 28 47  43 43 29 20 33 2e 34 2e   C gcc (G CC) 3.4.
0000030A  32 20 32 30 30 34 31 30  31 37 20 28 52 65 64 20   2 200410 17 (Red 
0000031A  48 61 74 20 33 2e 34 2e  32 2d 36 2e 66 63 33 29   Hat 3.4. 2-6.fc3)
0000032A  2c 20 52 65 64 73 68 69  66 74 20 31 2e 30 2e 35   , Redshi ft 1.0.5
0000033A  32 39 34 33 00 00                                  2943..

# Packet 6
00000340  43 00 00 00 0b 53 45 4c  45 43 54 00               C....SEL ECT.

This different packet segmentation seems to trigger a bug in the Redshift JDBC driver's parsing of the startup message.

Slight variant of the error.

I tried to confirm whether the same error was happening with both dbeaver and datagrip. They have slightly different configuration dialogs, and they both had different patch versions of the driver. The driver in DBeaver didn't throw the UTF-8 error, but it did fail at the same point in the packet exchange.

Caused by: java.io.IOException: Unexpected packet type: 0 is the error message in the log.

and here's the accompanying log and packet capture for this test (via the authenticated proxy):

dbeaver.pcapng.txt dbeaver.jdbc.txt

Workarounds

The current recommendation in our documentation is to use the postgres jdbc driver, but that leaves a lot to be desired for end users. Things like external schemas do not populate, and it is not clear to end-users why.

Request

If Teleport could activate a redshift "dialect" that would help it fall more in line with the redshift implementation that the JDBC Redshift driver is tested against, that would be helpful. I don't know if the postgres protocol itself actually requires that messages not be broken up into separate packets, or if it's simply a convention that has always been around.

Versions

Other Debug logs are scattered throughout the ticket, but here are the teleport logs for tsh and the agent:

When clicking "test connection"

greedy52 commented 1 year ago

I just tried dbeaver + Redshift (non-serverless) and I can reproduce the Caused by: java.io.IOException: Unexpected packet type: 0 error.

There is a discrepancy between the JDBC driver vs the database agent on parsing RowDescription.

I am able to "bypass" this error by setting a driver property client_protocol_version to 0 and things start working

Screenshot 2023-07-24 at 11 58 12 AM

(note that this is a "hidden" property so have to add as "User Properties', reference)

I will try datagrip next to see if it has same problem. (and will look deeper into this client_protocol_version)

--- update on datagrip Setting client_protocol_version to 0 also bypass connection errors.