Closed wangqinghuan closed 3 years ago
@wangqinghuan unfortunately, kerberos is the one of very few aspects of the client that was implemented by comparing to other code and reading how things "should" work, and I was not able to test kerberos flow locally.
I'd be happy to help debug interactively debug this in the discord channel, if you have time. Alternatively, if you have a really easy way I can setup a kerberos environment (like, a dockerfile that I can run and then connect to), I'd be happy to debug with that as well. The last time I tried kerberos I spent two nights learning and trying to set things up and got essentially nowhere.
As a last ditch alternative, I could just push a branch with a ton of debug printlns, and you could try that out (this is what I'd plan to do interactively in discord, so doing it non-interactively will just have a lot more delay).
Let me know if any of these debugging proposals would work for you, or if you have a better idea. Sorry for the problems!
Lmk if you have an update / thoughts, otherwise I'll be closing this in a few days.
Hi twmb, you can not debug code interactively because of it's in a private cloud of government. Currently, I have changed to sarama to collect kafka metrics successfully. However, I wish you could push a branch that help prints more debug log, let's find the reason why SASL init failed.
Is it possible for you to try this commit https://github.com/twmb/franz-go/commit/88e131d52265781774028c9de6fabad8bc2fc960 ? I think you'd need to edit kminion's go.mod directly to point to that commit.
not works and the same error. Would you have anyt other ideas about this error? I will try to setup a kerberos dcoker env recently if have time.
That's a big bummer. I spent a good few hours walking through this confluent guide, but it doesn't actually show in the end how to integrate Kafka with that, and it papers over a lot of instructions, especially if you're on linux, so I couldn't set up kerberos successfully.
I then walked through the Sarama code and modified it so that I had a predefined ticket & encryption key, as well as authenticator time, and hardcoded those same ticket / enc key / auth times into my client, and compared a write from my client to a write from the sarama client. That's where I noticed the one (1) byte was off, and I fixed that. Unfortunately, I can't compare the flow for the second challenge because that requires the setup to actually be working. So, this test should have shown that, with the same input client, sarama and this should behave the same.
Minus the user/pass, what is your sarama config? And what is your kminion config? I wonder if the clients are initialized exactly the same.
Also, one other idea, would it be possible to capture the plaintext communication between your Kafka server, and then compare how kminion communicates with how sarama communicates? This would be rather involved, but then we could compare the actual bytes (or, if this is sensitive, you could maybe tell me which bytes are off / how much they're off by).
Alternatively, if you know how to get a docker-compose file for this & kafka, I'll do all the debugging for the two clients. I just have no idea how to set up a krb5 server & hook it up with kafka & the client.
Also, if you'd like, I monitor the discord server pretty heavily.
Also, just for confirmation -- you edited the kminion/backend/go.mod file and then rebuilt the backend?
Also, just for confirmation -- you edited the kminion/backend/go.mod file and then rebuilt the backend?
Yes, this is my snapshot of Kminion
Minus the user/pass, what is your sarama config? And what is your kminion config? I wonder if the clients are initialized exactly the same.
Kminion config:
kafka:
brokers: [172.16.1.102:21007]
clientId: "kminion"
rackId: ""
tls:
enabled: false
caFilepath: ""
certFilepath: ""
keyFilepath: ""
passphrase: ""
insecureSkipTlsVerify: false
sasl:
# Whether or not SASL authentication will be used for authentication
enabled: true
# Username to use for PLAIN or SCRAM mechanism
username: ""
# Password to use for PLAIN or SCRAM mechanism
password: ""
# Mechanism to use for SASL Authentication. Valid values are PLAIN, SCRAM-SHA-256, SCRAM-SHA-512, GSSAPI
mechanism: "GSSAPI"
# GSSAPI / Kerberos config properties
gssapi:
authType: "KEYTAB_AUTH"
keyTabPath: "C:\\user.keytab"
kerberosConfigPath: "C:\\krb5.conf"
serviceName: "kafka"
username: "qdgakk"
password: ""
realm: "BOTECH.COM"
Sarama config
config := sarama.NewConfig()
config.Version = version
config.Net.SASL.Enable = true
config.Net.SASL.Mechanism = sarama.SASLTypeGSSAPI
config.Net.SASL.GSSAPI.ServiceName = "kafka"
config.Net.SASL.GSSAPI.KerberosConfigPath = "C:\\krb5.conf"
config.Net.SASL.GSSAPI.Realm = "BOTECH.COM"
config.Net.SASL.GSSAPI.Username = "qdgakk"
config.Net.SASL.GSSAPI.KeyTabPath = "C:\\user.keytab"
config.Net.SASL.GSSAPI.AuthType = sarama.KRB5_KEYTAB_AUTH
I then walked through the Sarama code and modified it so that I had a predefined ticket & encryption key, as well as authenticator time, and hardcoded those same ticket / enc key / auth times into my client, and compared a write from my client to a write from the sarama client. That's where I noticed the one (1) byte was off, and I fixed that. Unfortunately, I can't compare the flow for the second challenge because that requires the setup to actually be working. So, this test should have shown that, with the same input client, sarama and this should behave the same.
The krb5Kdc.log shows it have got Kafka tickets successully.
Jun 07 10:22:23 botech2 krb5kdc[13131](info): AS_REQ (3 etypes {18 17 23}) 192.168.202.198: ISSUE: authtime 1623032543, etypes {rep=18 tkt=18 ses=18}, qdgakk@BOTECH.COM for krbtgt/BOTECH.COM@BOTECH.COM
Jun 07 10:22:23 botech2 krb5kdc[13133](info): TGS_REQ (3 etypes {18 17 23}) 192.168.202.198: ISSUE: authtime 1623032543, etypes {rep=18 tkt=18 ses=18}, qdgakk@BOTECH.COM for kafka/hadoop.botech.com@BOTECH.COM
Jun 07 10:23:14 botech2 krb5kdc[13135](info): AS_REQ (3 etypes {18 17 23}) 192.168.202.198: NEEDED_PREAUTH: qdgakk@BOTECH.COM for krbtgt/BOTECH.COM@BOTECH.COM, Additional pre-authentication required
Jun 07 10:23:14 botech2 krb5kdc[13092](info): AS_REQ (3 etypes {18 17 23}) 192.168.202.198: ISSUE: authtime 1623032594, etypes {rep=18 tkt=18 ses=18}, qdgakk@BOTECH.COM for krbtgt/BOTECH.COM@BOTECH.COM
Jun 07 10:23:14 botech2 krb5kdc[13148](info): TGS_REQ (3 etypes {18 17 23}) 192.168.202.198: ISSUE: authtime 1623032594, etypes {rep=18 tkt=18 ses=18}, qdgakk@BOTECH.COM for kafka/hadoop.botech.com@BOTECH.COM
I am not expert for Kerberos Auth, however, it seems Client Authentication and Client Service Authorization has accomplished according to WIKI description. The error occurs in the 'Client Service Request' step. 'Client Service Request' step is the 'second chanllenge' you said ?
Also, one other idea, would it be possible to capture the plaintext communication between your Kafka server, and then compare how kminion communicates with how sarama communicates? This would be rather involved, but then we could compare the actual bytes (or, if this is sensitive, you could maybe tell me which bytes are off / how much they're off by).
I will capture the communition and comapre diff.
I didn't use tcpdump. This is my tcpdump command executed on Kafka Broker host, 'tcpdump tcp port 21007 and src host 192.168.202.198'. The port 21007 is Kafka broker port and 192.168.202.198 is kminion or Sarama host ip. The tcpdump ouput when Sarama requesting:
13:16:35.821786 IP 192.168.202.198.49600 > botech1.21007: Flags [.], seq 4096867901:4096867902, ack 1115128171, win 513, length 1
13:16:50.826753 IP 192.168.202.198.49600 > botech1.21007: Flags [.], seq 0:1, ack 1, win 513, length 1
13:17:05.841297 IP 192.168.202.198.49600 > botech1.21007: Flags [.], seq 0:1, ack 1, win 513, length 1
13:17:20.845950 IP 192.168.202.198.49600 > botech1.21007: Flags [.], seq 0:1, ack 1, win 513, length 1
13:17:35.852095 IP 192.168.202.198.49600 > botech1.21007: Flags [.], seq 0:1, ack 1, win 513, length 1
13:17:50.866888 IP 192.168.202.198.49600 > botech1.21007: Flags [.], seq 0:1, ack 1, win 513, length 1
13:18:05.868874 IP 192.168.202.198.49600 > botech1.21007: Flags [.], seq 0:1, ack 1, win 513, length 1
13:18:20.878228 IP 192.168.202.198.49600 > botech1.21007: Flags [.], seq 0:1, ack 1, win 513, length 1
13:18:35.892508 IP 192.168.202.198.49600 > botech1.21007: Flags [.], seq 0:1, ack 1, win 513, length 1
13:18:50.903415 IP 192.168.202.198.49600 > botech1.21007: Flags [.], seq 0:1, ack 1, win 513, length 1
13:19:05.905621 IP 192.168.202.198.49600 > botech1.21007: Flags [.], seq 0:1, ack 1, win 513, length 1
tcpdump output when kminion requesting
13:19:10.852253 IP 192.168.202.198.49600 > botech1.21007: Flags [R.], seq 1, ack 1, win 0, length 0
13:27:44.585790 IP 192.168.202.198.50951 > botech1.21007: Flags [S], seq 3223309109, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
13:27:44.589181 IP 192.168.202.198.50951 > botech1.21007: Flags [.], ack 3476561371, win 513, length 0
13:27:44.589815 IP 192.168.202.198.50951 > botech1.21007: Flags [P.], seq 0:21, ack 1, win 513, length 21
13:27:44.643804 IP 192.168.202.198.50951 > botech1.21007: Flags [.], ack 223, win 512, length 0
13:27:44.703432 IP 192.168.202.198.50951 > botech1.21007: Flags [P.], seq 21:564, ack 223, win 512, length 543
13:27:44.710006 IP 192.168.202.198.50951 > botech1.21007: Flags [.], ack 224, win 512, length 0
13:27:44.710048 IP 192.168.202.198.50951 > botech1.21007: Flags [F.], seq 564, ack 224, win 512, length 0
13:27:44.817853 IP 192.168.202.198.50952 > botech1.21007: Flags [S], seq 140603499, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
13:27:44.822980 IP 192.168.202.198.50952 > botech1.21007: Flags [.], ack 2214273098, win 513, length 0
13:27:44.823031 IP 192.168.202.198.50952 > botech1.21007: Flags [P.], seq 0:21, ack 1, win 513, length 21
13:27:44.828094 IP 192.168.202.198.50952 > botech1.21007: Flags [P.], seq 21:563, ack 223, win 512, length 542
13:27:44.832575 IP 192.168.202.198.50952 > botech1.21007: Flags [.], ack 224, win 512, length 0
13:27:44.832601 IP 192.168.202.198.50952 > botech1.21007: Flags [F.], seq 563, ack 224, win 512, length 0
13:27:45.036337 IP 192.168.202.198.50953 > botech1.21007: Flags [S], seq 901179016, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
13:27:45.039846 IP 192.168.202.198.50953 > botech1.21007: Flags [.], ack 3598312153, win 513, length 0
13:27:45.039934 IP 192.168.202.198.50953 > botech1.21007: Flags [P.], seq 0:21, ack 1, win 513, length 21
13:27:45.044763 IP 192.168.202.198.50953 > botech1.21007: Flags [P.], seq 21:563, ack 223, win 512, length 542
13:27:45.047742 IP 192.168.202.198.50953 > botech1.21007: Flags [.], ack 224, win 512, length 0
13:27:45.047785 IP 192.168.202.198.50953 > botech1.21007: Flags [F.], seq 563, ack 224, win 512, length 0
Is my command Correct or should we monitor Kerberos port instead of Kafka port?
I think we can circumvent needing tcpdump (which I use about once a year) by using sarama's ProxyDialer and kgo's Dialer option.
In both clients:
type conn struct {
net.Conn
}
func (c *conn) Write(p []byte) (int, error) {
fmt.Println(p)
return c.Conn.Write(p)
}
func (c *conn) Read(p []byte) (int, error) {
n, err := c.Conn.Read(p)
fmt.Println(p[:n])
return n, err
}
type dialer struct {
net.Dialer
}
func (d *dialer) Dial(network, addr string) (net.Conn, error) {
c, err := d.Dialer.Dial(network, addr)
return &conn{c}, err
}
func (d *dialer) DialContext(ctx context.Context, network, addr string) (net.Conn, error) {
c, err := d.Dialer.DialContext(ctx, network, addr)
return &conn{c}, err
}
in sarama,
cfg.Net.Proxy.Enable = true
cfg.Net.Proxy.Dialer = &dialer{&net.Dialer{}}
in kgo,
cl, err := NewClient(
...your opts,
kgo.Dialer((new(dialer{new(net.Dialer)})).DialContext),
)
That should log bytes written and read to the wire, but I haven't tested this. If you don't make progress on this (also thank you very much for debugging so far), I may be able to help more in an hour or at worst, tomorrow.
Also, I'm not sure how much of this communication is private, so it may be best to use temporary credentials if you go this route.
Thinking about this further, even comparing bytes on the wire will not help with outbound problems, because the bytes include timestamps, so they will always look different.
I'll try to think of some good logging statements to add. If you're able to come up with a dockerfile that I can use to plug in and test both sarama and franz-go, that I think would be the best thing from a debuggability standpoint, but I think a dockerfile or docker-compose.yml would be really difficult (otherwise I'd have done that already), and I don't expect you to do this.
I looked into how librdkafka does this but it punts the entire sasl flow to an external library. I again looked into comparing sarama and franz-go code, and I'm not seeing anything different at the moment.
Can you re-run kminion with the debug logs again? I'm wondering if it fails after step 0 or after step 1 now.
Please reopen if you'd like to continue investigating this with me, otherwise it seems that we're at an impasse!
Hi I am using kminion which embeded franz-go client to connect kerberized Kafka. After set up config.yaml, I got a 'unable to initialize sasl' exception. The config.yaml and exception as follows. config.yaml:
Debug Info:
Do you know how I can avoid this exception? Appreciate