keycloak / keycloak

Open Source Identity and Access Management For Modern Applications and Services
https://www.keycloak.org
Apache License 2.0
23.42k stars 6.77k forks source link

Unable to start JGroups Channel - Invalid argument: no further information #23330

Closed mscanlon-iowastudentloan closed 9 months ago

mscanlon-iowastudentloan commented 1 year ago

Before reporting an issue

Area

infinispan

Describe the bug

I'm trying to upgrade our Keycloak instance from 21.1.2 to 22.0.3 but it fails to start with an infinispan JGroups error. I am using Windows Server 2022 and Corretto JDK 17.0.8. I've been able to reproduce the error with a base Keycloak distribution Zip and 2 properties.

Version

22.0.3

Expected behavior

Keycloak should stat up and create a new JGroups channel using the built-in TCP stack.

Actual behavior

INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000078: Starting JGroups channel 'ISPN' with stack 'tcp'
INFO  [org.jgroups.JChannel] (keycloak-cache-init) local_addr: 67b8c56a-bbd5-44be-840d-0c256ff01dd6, name: COMPUTERNAME-53544
ERROR [org.infinispan.CONFIG] (keycloak-cache-init) ISPN000660: DefaultCacheManager start failed, stopping any running components: org.infinispan.commons.CacheException: Unable to start JGroups Channel
        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:622)
        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:489)
        at org.infinispan.remoting.transport.jgroups.CorePackageImpl$1.start(CorePackageImpl.java:42)
        at org.infinispan.remoting.transport.jgroups.CorePackageImpl$1.start(CorePackageImpl.java:27)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.invokeStart(BasicComponentRegistryImpl.java:616)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:607)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:576)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:807)
        at org.infinispan.metrics.impl.MetricsCollector.start(MetricsCollector.java:84)
        at org.infinispan.metrics.impl.CorePackageImpl$1.start(CorePackageImpl.java:41)
        at org.infinispan.metrics.impl.CorePackageImpl$1.start(CorePackageImpl.java:34)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.invokeStart(BasicComponentRegistryImpl.java:616)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:607)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:576)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:807)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.startDependencies(BasicComponentRegistryImpl.java:634)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:598)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:576)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:807)
        at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:379)
        at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:252)
        at org.infinispan.manager.DefaultCacheManager.internalStart(DefaultCacheManager.java:779)
        at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:747)
        at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:411)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.startCacheManager(CacheManagerFactory.java:96)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.net.SocketException: Invalid argument: no further information
        at java.base/sun.nio.ch.Net.setInterface6(Native Method)
        at java.base/sun.nio.ch.DatagramChannelImpl.setOption(DatagramChannelImpl.java:364)
        at java.base/sun.nio.ch.DatagramSocketAdaptor.setOption(DatagramSocketAdaptor.java:420)
        at java.base/sun.nio.ch.DatagramSocketAdaptor.setNetworkInterface(DatagramSocketAdaptor.java:605)
        at java.base/java.net.MulticastSocket.setNetworkInterface(MulticastSocket.java:445)
        at org.jgroups.protocols.MPING.start(MPING.java:154)
        at org.jgroups.stack.ProtocolStack.startStack(ProtocolStack.java:905)
        at org.jgroups.JChannel.startStack(JChannel.java:921)
        at org.jgroups.JChannel._preConnect(JChannel.java:799)
        at org.jgroups.JChannel.connect(JChannel.java:322)
        at org.jgroups.JChannel.connect(JChannel.java:316)
        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:620)
        ... 28 more

ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Failed to start server in (development) mode
ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Failed to start caches
ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: org.infinispan.manager.EmbeddedCacheManagerStartupException: Unable to start JGroups Channel
ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Unable to start JGroups Channel
ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Unable to start JGroups Channel
ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Invalid argument: no further information

How to Reproduce?

On Windows running Corretto 17.0.8.

kc.bat start-dev --cache=ispn --cache-stack=tcp

Anything else?

No response

ahus1 commented 1 year ago

@mscanlon-iowastudentloan - thank you for reporting this. I started it on a Linux and Windows machine, and didn't see the error message. Looking at the stack trace, it might be due to a specific network configuration on your Windows Server.

I ask you to enable debug logging for org.jgroups and org.infinispan and share the logs - see https://www.keycloak.org/server/logging

bin\kc.bat start-dev --cache=ispn --cache-stack=tcp --log-level="INFO,org.jgroups:debug,org.infinispan:debug"

It would also helpful to see the output of the command ipconfig.

Something to try out as a remedy: Recent versions of Keycloak enabled IPv6 and IPv4. See https://www.keycloak.org/server/configuration-production#_support_for_keycloak_in_a_cluster on how to configure it.

I suggest to try -Djava.net.preferIPv4Stack=true as it might restore the original behavior.

Depending on the details those reports show, the problem might be fixed by setting jgroups.bind.address to choose a specific interface - see https://github.com/infinispan/infinispan/blob/14.0.17.Final/core/src/main/resources/default-configs/default-jgroups-tcp.xml for the configuration file.

mscanlon-iowastudentloan commented 1 year ago

-Djava.net.preferIPv4Stack=true fixed it for me. Thank you!

ahus1 commented 1 year ago

@mscanlon-iowastudentloan happy to hear that it works now.

Could you please provide the log file anyway? Also the list of interfaces would help us. Looking at the Java code, this occurs when it tries to setup multicast in DatagramSockerAdpater (setOption(StandardSocketOptions.IP_MULTICAST_IF, netIf);), so the interface it picks might not have multicast enabled?

It would help us to improve Keycloak so the error message doesn't appear in the first place.

ahus1 commented 1 year ago

@pruivo - I've found some descriptions for that on SO and also in a JVM bug. Do you think a change in JGroups could prevent that?

The JVM bug description mentions:

For DatagramChannel, the issue can be worked around by specifying the INET protocol family when creating the DatagramChannel.

So if there is chance to determine that IPv6 is disabled for the interface, could that be an option?

mscanlon-iowastudentloan commented 1 year ago

Here is the log from running (version 22.0.3): bin\kc.bat start-dev --cache=ispn --cache-stack=tcp --log-level="INFO,org.jgroups:debug,org.infinispan:debug" kc.log And ipconfig output: ipconfig.log

pruivo commented 1 year ago

@pruivo - I've found some descriptions for that on SO and also in a JVM bug. Do you think a change in JGroups could prevent that?

The JVM bug description mentions:

For DatagramChannel, the issue can be worked around by specifying the INET protocol family when creating the DatagramChannel.

So if there is chance to determine that IPv6 is disabled for the interface, could that be an option?

I'm going to ping @belaban on this one.

belaban commented 1 year ago

If both IPv4 and IPv6 are enabled, JGroups will by default pick IPv4 addresses, unless bind_addr is an IPv6 address [1]. So, in your case, 10.106.11.105 should be picked as bind address and MPING should also bind to the IPv4 mcast address.

However, this is not the case: java.base/sun.nio.ch.Net.setInterface6 shows that the JVM picked the address family to be IPv6. This can only happen when

What's your actual config? Is there a way to repoduce this outside of Windows (don't have access to Windows), or do you have a docker image?

[1] http://www.jgroups.org/manual5/index.html#IPAddresses

mscanlon-iowastudentloan commented 1 year ago

I've been able to reproduce this with only 2 config items set. I just unzip the Windows Quarkus distribution zip file and run it with the command above, only setting these values: start-dev --cache=ispn --cache-stack=tcp

I did try setting -Djgroups.bind_addr=10.106.11.105 with the same failed result.

After messing around with it some more, I think I may have something useful for you. My NIC is IPv6 compatible, but I have IPv6 disabled on the NIC (in Windows, open the adapter properties and uncheck TCP/IPv6). If I enable IPv6, Keycloak starts up just fine with the tcp cache stack. So it seems that it's always trying to use IPv6 even if IPv6 is disabled on the NIC and there's no IPv6 address. If I specify -Djava.net.preferIPv4Stack=true, then it works with IPv6 still disabled.

belaban commented 1 year ago

If you run the program below with 10.106.11.105 as argument, what's the output? If it shows no IPv6 addresses, then I can change the algorithm in JGroups fro selecting addresses slightly.

I would simply switch to IPv4 if bind_addr is IPv6 but the interface associated with bind_addr has no IPv6 address associated.

package org.jgroups.tests;

import org.jgroups.util.StackType;
import org.jgroups.util.Util;
import java.net.Inet4Address;
import java.net.Inet6Address;
import java.net.InetAddress;
import java.net.NetworkInterface;
import java.util.List;
import java.util.stream.Collectors;

public class bla2 {

    protected void start(String bind_addr) throws Exception {
        StackType stack_type=Util.getIpStackType();
        System.out.println("stack_type = " + stack_type);

        InetAddress bind=InetAddress.getByName(bind_addr);
        NetworkInterface intf=NetworkInterface.getByInetAddress(bind);
        List<InetAddress> list=intf.inetAddresses().collect(Collectors.toList());
        System.out.printf("%d addresses on interface %s: %s", list.size(), intf.getName(), list);
        int ipv4=(int)intf.inetAddresses().filter(a -> a instanceof Inet4Address).count();
        int ipv6=(int)intf.inetAddresses().filter(a -> a instanceof Inet6Address).count();
        System.out.printf("\n(%d IPv4 addresses, %d IPv6 addresses)\n", ipv4, ipv6);
    }

    public static void main(String[] args) throws Exception {
        new bla2().start(args[0]);
    }
}
mscanlon-iowastudentloan commented 1 year ago

Here is my output:

java -cp org.jgroups.jgroups-5.2.18.Final.jar bla2.java 10.106.11.105 stack_type = Dual 1 addresses on interface eth0: [/10.106.11.105] (1 IPv4 addresses, 0 IPv6 addresses)

belaban commented 1 year ago

Hmm, the question is how would an IPv6 bind_addr be picked by JGroups if the interface has IPv6 disabled. Do you force an IPv6 address for bind_addr? @mscanlon-iowastudentloan: interested in a joint debugging session? I don't have a Windows box available, so I can't rep[roduce this issue on my side, plus I don't see how this can happen...

mscanlon-iowastudentloan commented 1 year ago

Sorry, I've been out of town for a week. I can setup a Zoom session or something to screen share if you want. Preferably between 7am and 4pm on a weekday.