ivangfr / keycloak-clustered

Keycloak-Clustered extends quay.io/keycloak/keycloak official Keycloak Docker image by adding JDBC_PING discovery protocol.
167 stars 57 forks source link

Support for keycloak-x #9

Closed zdykstra closed 2 years ago

zdykstra commented 2 years ago

keycloak-containers has moved to 'legacy' status, with https://github.com/keycloak/keycloak/tree/main/quarkus/container being the new base Dockerized keycloak setup.

Does this new code base support adding in alternate discovery mechanisms - like JDBC_PING ?

ivangfr commented 2 years ago

Hi @zdykstra yes, the new code base supports JDBC_PING but you need to do it manually, i.e., write a infinispan xml configuration file, add it to a Docker image and set KC_CACHE_CONFIG_FILE to read it.

I've added to this project (keycloak-clustered) a new branch called keycloak-quarkus where I implement JDBC_PING using the newest Keycloak version 17.

Let me know if you have more questions.

Best regards

zdykstra commented 2 years ago

That's fantastic news, thank you! I'll follow up early next week if I hit any snags, but the example Dockerfile looks quite digestible.

zdykstra commented 2 years ago

@ivangfr In the example cache-ispn-jdbc-ping.xml, the insert uses ${jgroups.tcp.address:127.0.0.1} for the node's IP address. In my testing though, this seems to be undefined - resulting in 127.0.0.1 being inserted as a value into the table. Searching around for that jgroups property hasn't revealed anything.

Should I be defining one of the system properties that then becomes this value? Or can I set a value in the environment and pull it from there? Under the now legacy branch, I was setting JGROUPS_DISCOVERY_EXTERNAL_IP in my container environment, and that took care of the correct IP being populated into the table for each node.

ivangfr commented 2 years ago

Hi @zdykstra yes, you are completely right. We need to find how to set it. For legacy Keycloak, we were using in JDBC_PING.cli

...
try
    :resolve-expression(expression=${env.JGROUPS_DISCOVERY_EXTERNAL_IP})
    /subsystem=jgroups/stack=tcp/transport=TCP/property=external_addr/:add(value=${env.JGROUPS_DISCOVERY_EXTERNAL_IP})
catch
    echo "JGROUPS_DISCOVERY_EXTERNAL_IP maybe not set."
end-try
...
zdykstra commented 2 years ago

I wasn't able to find a jgroups property that was actually defined. I ended up using a thin wrapper script around the Keycloak entrypoint that sets an environment variable with the instance IP address. That seems to work well enough.

When running in production mode, I found that I had to set a few things at container build time. Below is my current Dockerfile:

FROM quay.io/keycloak/keycloak:17.0.0

COPY quarkus/cache-ispn-jdbc-ping.xml /opt/keycloak/conf/cache-ispn-jdbc-ping.xml

ENV KC_METRICS_ENABLED=true
ENV KC_DB=postgres
ENV KC_CACHE_CONFIG_FILE=cache-ispn-jdbc-ping.xml
ENV KC_HTTP_ENABLED=true
RUN /opt/keycloak/bin/kc.sh build

ADD ./custom-env.sh /opt/keycloak/bin/

ENTRYPOINT [ "/opt/keycloak/bin/custom-env.sh", "start" ]

Additionally, postgres does work for me out of the box when I'm connecting to a database created / used by Keycloak 15.1.1 .

ivangfr commented 2 years ago

@zdykstra could you please share your cache-ispn-jdbc-ping.xml?

Have you replaced ${jgroups.bind.address:127.0.0.1} by something like ${JGROUPS_DISCOVERY_EXTERNAL_IP:127.0.0.1} and then you provide the value for JGROUPS_DISCOVERY_EXTERNAL_IP in your custom-env.sh script?

About connecting to Postgres DB already pre-initialized by a legacy Keycloak, it might work because it has already present the "migration_model". When using a fresh Postgres DB and Keycloak 17, the latter complains about as mentioned in this issue https://github.com/keycloak/keycloak/issues/10235

zdykstra commented 2 years ago

Yup, that's exactly what I did. Below is my custom-env.sh script copied in (mostly) verbatim from my legacy deployments on GCP. I don't believe BIND is used any more, but I haven't removed it.

#!/bin/bash
echo "Setting run-time environment variables"
HOSTIP="$( curl -H "Metadata-Flavor: Google" http://metadata/computeMetadata/v1/instance/network-interfaces/0/ip )"

JGROUPS_DISCOVERY_EXTERNAL_IP="${HOSTIP}"
export JGROUPS_DISCOVERY_EXTERNAL_IP
echo "Setting external discovery IP to ${JGROUPS_DISCOVERY_EXTERNAL_IP}"

BIND="${HOSTIP}"
export BIND
echo "Setting BIND IP to ${BIND}"

exec /opt/keycloak/bin/kc.sh $@

My XML changes diff is as follows:

bash-5.1$ diff -u <( curl --silent https://raw.githubusercontent.com/ivangfr/keycloak-clustered/keycloak-quarkus/17.0.0/cache-ispn-jdbc-ping.xml ) quarkus/cache-ispn-jdbc-ping.xml
--- /dev/fd/63  2022-03-08 09:58:39.636234662 -0600
+++ quarkus/cache-ispn-jdbc-ping.xml    2022-03-07 17:12:05.944840015 -0600
@@ -26,7 +26,7 @@
                        connection_username="${env.KC_DB_USERNAME}" connection_password="${env.KC_DB_PASSWORD}"
                        connection_url="jdbc:mysql://${env.KC_DB_URL_HOST}/${env.KC_DB_URL_DATABASE}"
                        initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, bind_addr varchar(200) NOT NULL, updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, ping_data varbinary(5000) DEFAULT NULL, PRIMARY KEY (own_addr, cluster_name)) ENGINE=InnoDB DEFAULT CHARSET=utf8;"
-                       insert_single_sql="INSERT INTO JGROUPSPING (own_addr, cluster_name, bind_addr, updated, ping_data) values (?, ?, '${jgroups.tcp.address:127.0.0.1}', NOW(), ?);"
+                       insert_single_sql="INSERT INTO JGROUPSPING (own_addr, cluster_name, bind_addr, updated, ping_data) values (?, ?, '${env.JGROUPS_DISCOVERY_EXTERNAL_IP:127.0.0.1}', NOW(), ?);"
                        delete_single_sql="DELETE FROM JGROUPSPING WHERE own_addr=? AND cluster_name=?;"
                        select_all_pingdata_sql="SELECT ping_data, own_addr, cluster_name FROM JGROUPSPING WHERE cluster_name=?;"
                        info_writer_sleep_time="500"
@@ -39,7 +39,7 @@
                        connection_username="${env.KC_DB_USERNAME}" connection_password="${env.KC_DB_PASSWORD}"
                        connection_url="jdbc:mysql://${env.KC_DB_URL_HOST}/${env.KC_DB_URL_DATABASE}"
                        initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, bind_addr varchar(200) NOT NULL, updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, ping_data varbinary(5000) DEFAULT NULL, PRIMARY KEY (own_addr, cluster_name)) ENGINE=InnoDB DEFAULT CHARSET=utf8;"
-                       insert_single_sql="INSERT INTO JGROUPSPING (own_addr, cluster_name, bind_addr, updated, ping_data) values (?, ?, '${jgroups.tcp.address:127.0.0.1}', NOW(), ?);"
+                       insert_single_sql="INSERT INTO JGROUPSPING (own_addr, cluster_name, bind_addr, updated, ping_data) values (?, ?, '${env.JGROUPS_DISCOVERY_EXTERNAL_IP:127.0.0.1}', NOW(), ?);"
                        delete_single_sql="DELETE FROM JGROUPSPING WHERE own_addr=? AND cluster_name=?;"
                        select_all_pingdata_sql="SELECT ping_data, own_addr, cluster_name FROM JGROUPSPING WHERE cluster_name=?;"
                        info_writer_sleep_time="500"
@@ -52,7 +52,7 @@
                        connection_username="${env.KC_DB_USERNAME}" connection_password="${env.KC_DB_PASSWORD}"
                        connection_url="jdbc:postgresql://${env.KC_DB_URL_HOST}/${env.KC_DB_URL_DATABASE}"
                        initialize_sql="CREATE SCHEMA IF NOT EXISTS ${env.KC_DB_SCHEMA:public}; CREATE TABLE IF NOT EXISTS ${env.KC_DB_SCHEMA:public}.JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, bind_addr varchar(200) NOT NULL, updated timestamp default current_timestamp, ping_data BYTEA, constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name));"
-                       insert_single_sql="INSERT INTO ${env.KC_DB_SCHEMA:public}.JGROUPSPING (own_addr, cluster_name, bind_addr, updated, ping_data) values (?, ?, '${jgroups.tcp.address:127.0.0.1}', NOW(), ?);"
+                       insert_single_sql="INSERT INTO ${env.KC_DB_SCHEMA:public}.JGROUPSPING (own_addr, cluster_name, bind_addr, updated, ping_data) values (?, ?, '${env.JGROUPS_DISCOVERY_EXTERNAL_IP:127.0.0.1}', NOW(), ?);"
                        delete_single_sql="DELETE FROM ${env.KC_DB_SCHEMA:public}.JGROUPSPING WHERE own_addr=? AND cluster_name=?;"
                        select_all_pingdata_sql="SELECT ping_data, own_addr, cluster_name FROM ${env.KC_DB_SCHEMA:public}.JGROUPSPING WHERE cluster_name=?"
                        info_writer_sleep_time="500"
@@ -65,7 +65,7 @@
                        connection_username="${env.KC_DB_USERNAME}" connection_password="${env.KC_DB_PASSWORD}"
                        connection_url="jdbc:sqlserver://${env.KC_DB_URL_HOST}/${env.KC_DB_URL_DATABASE}"
                        initialize_sql="IF NOT EXISTS (SELECT 1 FROM sys.schemas WHERE name = '${env.KC_DB_SCHEMA:dbo}') BEGIN EXEC ('CREATE SCHEMA [${env.KC_DB_SCHEMA:dbo}] AUTHORIZATION [dbo]') END; IF NOT EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE' AND TABLE_NAME='JGROUPSPING') CREATE TABLE ${env.DB_SCHEMA:dbo}.JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, bind_addr varchar(200) NOT NULL, updated datetime2 default getdate(), ping_data varbinary(5000), constraint PK_JGROUPSPING PRIMARY KEY(own_addr, cluster_name));"
-                       insert_single_sql="INSERT INTO ${env.KC_DB_SCHEMA:dbo}.JGROUPSPING (own_addr, cluster_name, bind_addr, updated, ping_data) values (?, ?, '${jgroups.tcp.address:127.0.0.1}', GETDATE(), ?);"
+                       insert_single_sql="INSERT INTO ${env.KC_DB_SCHEMA:dbo}.JGROUPSPING (own_addr, cluster_name, bind_addr, updated, ping_data) values (?, ?, '${env.JGROUPS_DISCOVERY_EXTERNAL_IP:127.0.0.1}', GETDATE(), ?);"
                        delete_single_sql="DELETE FROM ${env.KC_DB_SCHEMA:dbo}.JGROUPSPING WHERE own_addr=? AND cluster_name=?;"
                        select_all_pingdata_sql="SELECT ping_data, own_addr, cluster_name FROM ${env.KC_DB_SCHEMA:dbo}.JGROUPSPING WHERE cluster_name=?;"
                        info_writer_sleep_time="500"
@@ -136,4 +136,4 @@
             <memory max-count="-1"/>
         </distributed-cache>
     </cache-container>
-</infinispan>
\ No newline at end of file
+</infinispan>

The environment variable name could probably use a better name - assuming there's no jgroups runtime value we can access. I'm not a huge fan of wrapper scripts, but for the purposes of quickly testing the quarkus branch of Keycloak, it works well enough.

ivangfr commented 2 years ago

Perfect!

It's funny, here I am doing the same but the cluster is not creating. I am using Vagrant and VMs to test.

Screenshot 2022-03-08 at 17 40 21
zdykstra commented 2 years ago

That's really odd. I assume the same test process works correctly with 17.0.0-legacy ? The node names appear to be different, so one instance shouldn't be over-writing the other in the table.

zdykstra commented 2 years ago

I just switched my stack to use MySQL as the database, and it looks like there is a bug here: My JGROUPSPING table has 8 entries all with the same bind_addr value.

MySQL [prod-keycloak]> SELECT own_addr,bind_addr,updated from JGROUPSPING;
+--------------------------------------+-------------+---------------------+
| own_addr                             | bind_addr   | updated             |
+--------------------------------------+-------------+---------------------+
| 11b64a07-8114-4171-bec0-fcecb13035ee | 10.x.x.2 | 2022-03-08 19:04:58 |
| 11e7eadd-46a6-434f-94f6-ebda8adcd0bf | 10.x.x.2 | 2022-03-08 19:04:59 |
| 2f4d9d19-adbc-49ba-b96a-fb5435897077 | 10.x.x.2 | 2022-03-08 19:04:58 |
| 36b327e6-3adf-4304-9e86-852a4d6366b1 | 10.x.x.2 | 2022-03-08 19:04:58 |
| 5a8b79db-8314-4386-ae6f-c90d74e98bbc | 10.x.x.2 | 2022-03-08 19:04:58 |
| 792517d2-c258-4ced-b926-1a5c8eae1181 | 10.x.x.2 | 2022-03-08 19:04:58 |
| eb622c38-18fb-4134-a28f-199e69eb0045 | 10.x.x.2 | 2022-03-08 19:04:59 |
| f2be8b75-dc71-4722-99ce-f9e871c527f4 | 10.x.x.2 | 2022-03-08 19:04:59 |
+--------------------------------------+-------------+---------------------+

One of my running instances does have a 10.x.x.2 IP address bound to it, so that wasn't pulled out of thin air.

Looking at the debug logs of a running instance, I see this:

:JDBC_PING(async_discovery_use_separate_thread_per_request=false;update_store_on_view_change=true;ergonomics=true;insert_single_sql=INSERT INTO JGROUPSPING (own_addr, cluster_name, bind_addr, updated, ping_data) values (?, ?, '10.x.x.9', NOW(), ?);;connection_driver=com.mysql.jdbc.Driver

After the Infinispan cluster has finished building, I see this log entry (running on the 10.x.x.9 instance):

actualMembers=[keycloak-http-7nrw-62503, keycloak-http-h348-20537, keycloak-http-8ngn-46383, keycloak-http-2qfs-12741, keycloak-http-8m29-44501, keycloak-http-0404-6038, keycloak-http-4d9w-17253, keycloak-http-zgl7-18869]

The instance with the 10.x.x.2 IP address has a hostname of keycloak-http-7nrw. So it looks like possibly that instance has been designated as the cluster leader, and has rewritten all of the entries?

I didn't check my postgres instance before I destroyed it, but it's not unreasonable to assume that the table in there has the same bind_addr issue present.

zdykstra commented 2 years ago

After looking through the logs for keycloak-http-7nrw, it seems that the first instance that adds itself into the JGROUPSPING table is controlling/rewriting entries.

 162 2022-03-08 19:04:44,112 DEBUG [org.jgroups.protocols.pbcast.GMS] (jgroups-6,keycloak-http-7nrw-62503) keycloak-http-7nrw-62503: installing view [keycloak-http-7nrw-62503|1] (2) [keycloak-http-7nrw-62503, keycloak-http-h348-20537] (keycloak-http-h348-20537 joined)
 163 2022-03-08 19:04:44,125 DEBUG [org.infinispan.persistence.manager.PreloadManager] (keycloak-cache-init) Preloaded 0 keys in 0 milliseconds
 164 2022-03-08 19:04:44,125 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl] (non-blocking-thread--p2-t1) Node keycloak-http-7nrw-62503 joining cache sessions
 165 2022-03-08 19:04:44,127 DEBUG [org.infinispan.topology.ClusterCacheStatus] (non-blocking-thread--p2-t1) Queueing rebalance for cache sessions with members [keycloak-http-7nrw-62503]
 166 2022-03-08 19:04:44,128 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl] (non-blocking-thread--p2-t1) Updating local topology for cache sessions: CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=DefaultConsistentHash{ns=256, owners = (1)[keycloak-http-7n     rw-62503: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[keycloak-http-7nrw-62503], persistentUUIDs=[33914e50-1e9a-4222-b460-dcb685dacac3]}
 167 2022-03-08 19:04:44,130 DEBUG [org.infinispan.statetransfer.StateConsumerImpl] (non-blocking-thread--p2-t1) Removing no longer owned entries for cache sessions
 168 2022-03-08 19:04:44,132 DEBUG [org.infinispan.commons.util.ServiceFinder] (keycloak-cache-init) Loading service impl: org.infinispan.persistence.remote.upgrade.HotRodTargetMigrator
 169 2022-03-08 19:04:44,133 DEBUG [org.infinispan.cache.impl.CacheImpl] (keycloak-cache-init) Started cache sessions on keycloak-http-7nrw-62503
 170 2022-03-08 19:04:44,133 DEBUG [org.infinispan.manager.DefaultCacheManager] (keycloak-cache-init) Creating cache clientSessions on keycloak-http-7nrw-62503
 171 2022-03-08 19:04:44,137 DEBUG [org.jgroups.protocols.JDBC_PING] (jgroups-6,keycloak-http-7nrw-62503) keycloak-http-7nrw-62503: cleared table for cluster ISPN
 172 2022-03-08 19:04:44,135 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger-10,keycloak-http-7nrw-62503) keycloak-http-7nrw-62503: pingable_mbrs=[keycloak-http-7nrw-62503, keycloak-http-h348-20537], ping_dest=keycloak-http-h348-20537
 173 2022-03-08 19:04:44,143 DEBUG [org.jgroups.protocols.JDBC_PING] (jgroups-6,keycloak-http-7nrw-62503) Removed eb622c38-18fb-4134-a28f-199e69eb0045 for cluster ISPN from database
 174 2022-03-08 19:04:44,146 DEBUG [org.infinispan.commons.util.ServiceFinder] (keycloak-cache-init) No service impls found: FilterIndexingServiceProvider
 175 2022-03-08 19:04:44,147 DEBUG [org.infinispan.interceptors.impl.AsyncInterceptorChainImpl] (keycloak-cache-init) Interceptor chain size: 10
 176 2022-03-08 19:04:44,149 DEBUG [org.jgroups.protocols.JDBC_PING] (jgroups-6,keycloak-http-7nrw-62503) Inserted eb622c38-18fb-4134-a28f-199e69eb0045 for cluster ISPN into database

I wonder if this is due to remove_all_data_on_view_change="true" being set, along with the insert_single_sql value only understanding how to insert data as if it were inserting it's own local data?

ivangfr commented 2 years ago

Btw, I removed the property remove_all_data_on_view_change="true" and now i have 2 records in JGROUPSPING.

mysql> SELECT * FROM JGROUPSPING;
+--------------------------------------+--------------+-----------+---------------------+---------------------------------------------------+
| own_addr                             | cluster_name | bind_addr | updated             | ping_data                                         |
+--------------------------------------+--------------+-----------+---------------------+---------------------------------------------------+
| 021bab9d-4946-49e8-bad5-0f1575637349 | ISPN         | 10.0.0.12 | 2022-03-08 20:47:11 | ��ucsI�IFI� aa9d5f2220e2-17038� x�� |
| b7082d44-3b6e-4892-a956-03f7fd7a7daa | ISPN         | 10.0.0.11 | 2022-03-08 20:46:22 | �V��z}�-D;nH� 5c23a8c18925-44604� x�� |
+--------------------------------------+--------------+-----------+---------------------+---------------------------------------------------+
2 rows in set (0.00 sec)

However, the login sessions, for instance, is not shared between the instances

Note: after restarting keycloak2, the record number in JGROUPSPING kept at 2, that is good.

zdykstra commented 2 years ago

I changed the table creation to this:

CREATE TABLE IF NOT EXISTS JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, ping_data varbinary(5000) DEFAULT NULL, PRIMARY KEY (own_addr, cluster_name)) ENGINE=InnoDB DEFAULT CHARSET=utf8; (dropping bind_addr)

and then in the infinispan clustering config, I dropped insert_single_sql, delete_single_sql and select_all_pingdata_sql from the XML document. All of the nodes seem seem to be able to find each other, so they must be encoding IP/port data into own_addr or possibly ping_data ?

Based on http://www.jgroups.org/manual4/index.html#_jdbc_ping clear_table_on_view_change being on/off really depends on your installation.

zdykstra commented 2 years ago

I can't seem to get it to just insert an un-encoded IP address into the own_addr column, though. I've tried various permutations of setting use_ip_addrs:

-Djgroups.use_ip_addrs=true -Djgroups.tcp.use_ip_addrs=true -Djgroups.tp.use_ip_addrs=true

I'm fairly certain, however, that my cluster is replicating data correctly. I see this in my logs:

2022-03-08 23:05:40,779 INFO [org.infinispan.CLUSTER] (jgroups-55,keycloak-http-dsj7-44367) [Context=loginFailures] ISPN100010: Finished rebalance with members [keycloak-http-dsj7-44367, keycloak-http-cscx-59602, keycloak-http-cw49-11520, keycloak-http-pjz3-17816, keycloak-http-rjxm-53343, keycloak-http-d323-44259, keycloak-http-cvzn-58992, keycloak-http-8d6p-22589, keycloak-http-076w-10245, keycloak-http-t3cf-14455], topology id 261

Edit:

diff --git a/quarkus/cache-ispn-jdbc-ping.xml b/quarkus/cache-ispn-jdbc-ping.xml
index 802b976..9dbcd8a 100644
--- a/quarkus/cache-ispn-jdbc-ping.xml
+++ b/quarkus/cache-ispn-jdbc-ping.xml
@@ -22,6 +22,7 @@
         xmlns="urn:infinispan:config:11.0">
     <jgroups>
         <stack name="mysql-jdbc-ping-tcp" extends="tcp">
+            <TCP use_ip_addrs="true"/>
             <JDBC_PING connection_driver="com.mysql.jdbc.Driver"
                        connection_username="${env.KC_DB_USERNAME}" connection_password="${env.KC_DB_PASSWORD}"
                        connection_url="jdbc:mysql://${env.KC_DB_URL_HOST}/${env.KC_DB_URL_DATABASE}"

But, they're still not raw addresses. Oh well.

ivangfr commented 2 years ago

Nice, congrats @zdykstra !

I am still trying to figure out the problem I have. Looks like the TCP communication between the Keycloak instances is not working in my private network of VMs (VirtualBox) cluster.

For instance

Keycloak1, id c8623458f8e5-36697

2022-03-09 07:54:16,030 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000078: Starting JGroups channel `ISPN` with stack `mysql-jdbc-ping-tcp`
2022-03-09 07:54:16,612 INFO  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) c8623458f8e5-36697: no members discovered after 32 ms: creating cluster as coordinator
2022-03-09 07:54:16,667 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000094: Received new cluster view for channel ISPN: [c8623458f8e5-36697|0] (1) [c8623458f8e5-36697]
2022-03-09 07:54:16,684 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000079: Channel `ISPN` local address is `c8623458f8e5-36697`, physical addresses are `[172.17.0.2:7800]`

Keycloak2, id de07bee46cb2-1784

2022-03-09 07:54:41,954 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000078: Starting JGroups channel `ISPN` with stack `mysql-jdbc-ping-tcp`
2022-03-09 07:54:44,432 WARN  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) de07bee46cb2-1784: JOIN(de07bee46cb2-1784) sent to c8623458f8e5-36697 timed out (after 2000 ms), on try 0
2022-03-09 07:54:46,448 WARN  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) de07bee46cb2-1784: JOIN(de07bee46cb2-1784) sent to c8623458f8e5-36697 timed out (after 2000 ms), on try 1
2022-03-09 07:54:48,455 WARN  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) de07bee46cb2-1784: JOIN(de07bee46cb2-1784) sent to c8623458f8e5-36697 timed out (after 2000 ms), on try 2
2022-03-09 07:54:50,463 WARN  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) de07bee46cb2-1784: JOIN(de07bee46cb2-1784) sent to c8623458f8e5-36697 timed out (after 2000 ms), on try 3
2022-03-09 07:54:52,472 WARN  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) de07bee46cb2-1784: JOIN(de07bee46cb2-1784) sent to c8623458f8e5-36697 timed out (after 2000 ms), on try 4
2022-03-09 07:54:54,482 WARN  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) de07bee46cb2-1784: JOIN(de07bee46cb2-1784) sent to c8623458f8e5-36697 timed out (after 2000 ms), on try 5
2022-03-09 07:54:56,490 WARN  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) de07bee46cb2-1784: JOIN(de07bee46cb2-1784) sent to c8623458f8e5-36697 timed out (after 2000 ms), on try 6
2022-03-09 07:54:58,497 WARN  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) de07bee46cb2-1784: JOIN(de07bee46cb2-1784) sent to c8623458f8e5-36697 timed out (after 2000 ms), on try 7
2022-03-09 07:55:00,508 WARN  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) de07bee46cb2-1784: JOIN(de07bee46cb2-1784) sent to c8623458f8e5-36697 timed out (after 2000 ms), on try 8
2022-03-09 07:55:02,517 WARN  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) de07bee46cb2-1784: JOIN(de07bee46cb2-1784) sent to c8623458f8e5-36697 timed out (after 2000 ms), on try 9
2022-03-09 07:55:02,518 WARN  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) de07bee46cb2-1784: too many JOIN attempts (10): becoming singleton
2022-03-09 07:55:02,550 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000094: Received new cluster view for channel ISPN: [de07bee46cb2-1784|0] (1) [de07bee46cb2-1784]
2022-03-09 07:55:02,563 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000079: Channel `ISPN` local address is `de07bee46cb2-1784`, physical addresses are `[172.17.0.2:7800]`

As we can see, (Keycloak1 c8623458f8e5-36697) started the cluster. Then, (Keycloak2 de07bee46cb2-1784) tried (10 times) to join Keycloak1

JOIN(de07bee46cb2-1784) sent to c8623458f8e5-36697 timed out (after 2000 ms), on try 0

After those failed attempts, Keycloak2 used the property remove_all_data_on_view_change="true", cleaned JGROUPSPING and, started it's own and new cluster

+--------------------------------------+--------------+-----------+---------------------+--------------------------------------------------+
| own_addr                             | cluster_name | bind_addr | updated             | ping_data                                        |
+--------------------------------------+--------------+-----------+---------------------+--------------------------------------------------+
| a8419bac-88b8-4937-8315-283e754a2161 | ISPN         | 10.0.0.12 | 2022-03-09 07:55:04 | �(>uJ!a�A����I7 de07bee46cb2-1784� x�� |
+--------------------------------------+--------------+-----------+---------------------+--------------------------------------------------+

I've already tried to add <TCP use_ip_addrs="true" /> and, as you did, drop the bind_addr, insert_single_sql, delete_single_sql and select_all_pingdata_sql, but the same problem.

zdykstra commented 2 years ago

That's really odd that your two keycloak VMs have Keycloak <> MySQL communication working, but not Keycloak <> Keycloak working. For what it's worth, when I tested this on a local Linux docker daemon, both Keycloak instances were able to talk with each other on the dedicated network I setup for it. I assume that when you test any of the -legacy releases (as upstream now calls them), everything works there?

ivangfr commented 2 years ago

Yes, the keycloak-clustered 16.1.1 works perfectly in this Vagrant & VMs environment I have.

Locally, in the host machine, running 2 containers of keycloak-clustered 17.0.0 and one MySQL container (all in the same network), the cluster works.

However, when one Keycloak instance is in one VM, another Keycloak instance is in another VM, and MySQL in another VM, looks like infinispan and/or jgroups are not working in order to make the Keycloak to join. The communication between Keycloak and MySQL is ok.

zdykstra commented 2 years ago

Do you have any VM firewall rules configured? It's possible that they've changed JGroups/Infinispan/whatever ports between versions, and that is impacting it. I've deployed quite a few times today on GCP's Container OS, and my clusters are joining correctly.

<TCP use_ip_addrs="true" /> did actually set a different own_addr value on GCP. It set a modified version of the hostname, making it quiet easy to tell which running instances are in the cluster.

So far, the new Quarkus-based images are working very very well for me compared to the legacy code. They start up very quickly compared to WildFly. Of course, I haven't actually sent any production traffic to them yet ...

ivangfr commented 2 years ago

No, no firewalls. I've committed to keycloak-quarkus branch the Vagrantfile and how-to instructions.

Whenever I have some free time, I will try try again to add <TCP use_ip_addrs="true" /> and drop the bind_addr and insert_single_sql, delete_single_sql and select_all_pingdata_sql for MySQL and MariaDB.

For Postgres and MSSQL, I intend to keep initialize_sql, insert_single_sql, delete_single_sql and select_all_pingdata_sql because, for these databases, we can use schemas and, consequently, we need to configure those commands manually.

Let's see... Thanks for the support and it was a pleasure to collaborate with you on this! Best!

zdykstra commented 2 years ago

I'm not sure if you've seen this yet - https://www.keycloak.org/2022/02/dbs.html .

ivangfr commented 2 years ago

Interesting @zdykstra Thanks for sharing!

ivangfr commented 2 years ago

Hi @zdykstra @whitepiratebaku

Btw, I've fixed the problem I was having when running keycloak-clustered using JDBC_PING in Virtual Machines. To start the VMs (Virtualbox), I use Vagrant.

The solution I've found was to add

<TCP external_addr="${env.JGROUPS_DISCOVERY_EXTERNAL_IP:127.0.0.1}" />

When running the Docker container, I inform the JGROUPS_DISCOVERY_EXTERNAL_IP as environment variable.

For now, I've kept the bind_addr as a column in the JGROUPSPING table.

You can see the final jgroups configuration here