kafka-ops / julie

A solution to help you build automation and gitops in your Apache Kafka deployments. The Kafka gitops!
MIT License
418 stars 113 forks source link

Julie MDSClient Timeout #497

Open yremmet opened 2 years ago

yremmet commented 2 years ago

Describe the bug I use Julie in several environments. All were deployed in the same way with similar configuration. For some reason, on the newer environments, the MDSClient no longer works and times out during authentication. However, sending the request manually via curl works. I have already tried different versions of java and julie. The broker and Julie run on the same VM

Runtime (please complete the following information):

Additional context

[DEBUG] 2022-05-04 09:54:48.134 [main] AccessControlProviderFactory - Function get
[DEBUG] 2022-05-04 09:54:48.134 [main] AccessControlProviderFactory - Function get
[DEBUG] 2022-05-04 09:54:48.134 [main] AccessControlProviderFactory - Function apiClientLogIn create mdsclient
[DEBUG] 2022-05-04 09:54:48.134 [main] AccessControlProviderFactory - Function apiClientLogIn create mdsclient
[DEBUG] 2022-05-04 09:54:48.157 [main] MDSApiClientBuilder - Connecting to an MDS server at https://xyz:8090
[DEBUG] 2022-05-04 09:54:48.157 [main] MDSApiClientBuilder - Connecting to an MDS server at https://xyz:8090
[DEBUG] 2022-05-04 09:54:48.157 [main] MDSApiClientBuilder - Connecting to an MDS server at https://xyz:8090
[DEBUG] 2022-05-04 09:54:48.158 [main] MDSApiClient - Authenticate client
[DEBUG] 2022-05-04 09:54:48.158 [main] MDSApiClient - Authenticate client
[DEBUG] 2022-05-04 09:54:48.158 [main] MDSApiClient - Authenticate client
[DEBUG] 2022-05-04 09:54:48.165 [main] JulieHttpClient - method: GET request.uri: https://xyz:8090/security/1.0/authenticate
[DEBUG] 2022-05-04 09:54:48.165 [main] JulieHttpClient - method: GET request.uri: https://xyz:8090/security/1.0/authenticate
[ERROR] 2022-05-04 09:55:48.176 [main] MDSApiClient - java.io.IOException: java.net.http.HttpTimeoutException: request timed out
[ERROR] 2022-05-04 09:55:48.176 [main] MDSApiClient - java.io.IOException: java.net.http.HttpTimeoutException: request timed out
[ERROR] 2022-05-04 09:55:48.176 [main] MDSApiClient - java.io.IOException: java.net.http.HttpTimeoutException: request timed out
Exception in thread "main" java.io.IOException: java.io.IOException: java.io.IOException: java.net.http.HttpTimeoutException: request timed out
        at com.purbon.kafka.topology.AccessControlProviderFactory.get(AccessControlProviderFactory.java:64)
        at com.purbon.kafka.topology.JulieOps.build(JulieOps.java:90)
        at com.purbon.kafka.topology.CommandLineInterface.processTopology(CommandLineInterface.java:206)
        at com.purbon.kafka.topology.CommandLineInterface.run(CommandLineInterface.java:156)
        at com.purbon.kafka.topology.CommandLineInterface.main(CommandLineInterface.java:146)
Caused by: java.io.IOException: java.io.IOException: java.net.http.HttpTimeoutException: request timed out
        at com.purbon.kafka.topology.api.mds.MDSApiClient.authenticate(MDSApiClient.java:51)
        at com.purbon.kafka.topology.AccessControlProviderFactory.get(AccessControlProviderFactory.java:57)
        ... 4 more
Caused by: java.io.IOException: java.net.http.HttpTimeoutException: request timed out
        at com.purbon.kafka.topology.clients.JulieHttpClient.doGet(JulieHttpClient.java:132)
        at com.purbon.kafka.topology.clients.JulieHttpClient.doGet(JulieHttpClient.java:117)
        at com.purbon.kafka.topology.api.mds.MDSApiClient.authenticate(MDSApiClient.java:40)
        ... 5 more
Caused by: java.net.http.HttpTimeoutException: request timed out
        at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:559)
        at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:119)
        at com.purbon.kafka.topology.clients.JulieHttpClient.doGet(JulieHttpClient.java:128)
        ... 7 more
[root@xyz julie]# time curl  https://xyz:8090/security/1.0/authenticate -k -u 'user:password' --fail
{"auth_token":"aGVsbG8gZnJpZW5k","token_type":"Bearer","expires_in":3600}
real    0m0.213s
user    0m0.009s
sys     0m0.004s
purbon commented 2 years ago

Some questions:

can you add julie.debug.mode=true in your config using the last 4.2.5 and share the results?

yremmet commented 2 years ago

Hi sorry it took me so long. The systems are airgapped and it takes some time to get data in and out.

which version of the confluent platform you using? 6.2.0 can you share more information about your logs and reponse codes within your MDS? As I said I can recreate the auth request with curl and it works just fine and the control center works as well. I added the logs of two run one with the correct credentials and one time it tried to run it with wrong credentials. It seems to me the answer of the MDS doesn't make it to julie.


[2022-05-06 07:02:21,646] DEBUG Login failed for julie_wrong (io.confluent.rbacapi.login.MdsLoginService)
[2022-05-06 07:02:21,651] INFO 127.0.0.1 - - [06/May/2022:07:02:21 +0000] "GET /security/1.0/authenticate HTTP/1.1" 401 101  25 (io.confluent.rest-utils.requests)
[2022-05-06 07:02:21,651] INFO 127.0.0.1 - - [06/May/2022:07:02:21 +0000] "GET /security/1.0/authenticate HTTP/1.1" 401 101  25 (io.confluent.rest-utils.requests)
[2022-05-06 07:02:21,651] INFO 127.0.0.1 - - [06/May/2022:07:02:21 +0000] "GET /security/1.0/authenticate HTTP/1.1" 401 101  25 (io.confluent.rest-utils.requests)
[2022-05-06 07:02:24,435] INFO 10.175.17.150 - - [06/May/2022:07:02:24 +0000] "GET /v1/metadata/id HTTP/1.1" 200 105  1 (io.confluent.rest-utils.requests)
[2022-05-06 07:02:24,435] INFO 10.175.17.150 - - [06/May/2022:07:02:24 +0000] "GET /v1/metadata/id HTTP/1.1" 200 105  1 (io.confluent.rest-utils.requests)
[2022-05-06 07:02:24,436] INFO 10.175.17.150 - - [06/May/2022:07:02:24 +0000] "GET /v1/metadata/id HTTP/1.1" 200 105  2 (io.confluent.rest-utils.requests)

[2022-05-06 07:08:32,485] DEBUG Login succeeded for julie (io.confluent.rbacapi.login.MdsLoginService) [2022-05-06 07:08:32,486] INFO 10.175.17.150 - srv_de-odskaf-l-t-01 [06/May/2022:07:08:32 +0000] "GET /security/1.0/metadataClusterId HTTP/1.1" 200 22 194 (io.confluent.rest-utils.requests) [2022-05-06 07:08:32,486] INFO 10.175.17.150 - srv_de-odskaf-l-t-01 [06/May/2022:07:08:32 +0000] "GET /security/1.0/metadataClusterId HTTP/1.1" 200 22 194 (io.confluent.rest-utils.requests) [2022-05-06 07:08:32,486] INFO 10.175.17.150 - srv_de-odskaf-l-t-01 [06/May/2022:07:08:32 +0000] "GET /security/1.0/metadataClusterId HTTP/1.1" 200 22 194 (io.confluent.rest-utils.requests) [2022-05-06 07:08:36,052] INFO 10.175.17.150 - - [06/May/2022:07:08:36 +0000] "GET /v1/metadata/id HTTP/1.1" 200 105 2 (io.confluent.rest-utils.requests) [2022-05-06 07:08:36,052] INFO 10.175.17.150 - - [06/May/2022:07:08:36 +0000] "GET /v1/metadata/id HTTP/1.1" 200 105 2 (io.confluent.rest-utils.requests) [2022-05-06 07:08:36,052] INFO 10.175.17.150 - - [06/May/2022:07:08:36 +0000] "GET /v1/metadata/id HTTP/1.1" 200 105 2 (io.confluent.rest-utils.requests)


I did run the latest release with the debug.mode but the logs mean me nothing to me.

log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.admin.AdminClientConfig). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. shouldRetry: count=1 shouldRetry: count=2 shouldRetry: count=3 shouldRetry: count=4 shouldRetry: count=5 shouldRetry: count=6 shouldRetry: count=7 shouldRetry: count=8 shouldRetry: count=9 shouldRetry: count=10 shouldRetry: count=11 shouldRetry: count=12



I appreciate your help 
thank you