microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
16.91k stars 798 forks source link

In mirrored network mode the Kafka client is unable to connect to the server properly. #11450

Open JustLookAtNow opened 2 months ago

JustLookAtNow commented 2 months ago

Windows Version

10.0.22635.3430

WSL Version

2.2.1.0

Are you using WSL 1 or WSL 2?

Kernel Version

5.15.150.1

Distro Version

Ubuntu 22.04

Other Software

org.springframework.kafka:spring-kafka:2.9.13 jdk 1.8

Repro Steps

Using the Java program I've written to connect to the Kafka server.

Expected Behavior

Successfully connected.

Actual Behavior

It threw an error and raised an exception.

 java.net.BindException: Cannot assign requested address
    at sun.nio.ch.Net.connect0(Native Method) ~[?:1.8.0_402]
    at sun.nio.ch.Net.connect(Net.java:482) ~[?:1.8.0_402]
    at sun.nio.ch.Net.connect(Net.java:474) ~[?:1.8.0_402]
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:647) ~[?:1.8.0_402]
    at org.apache.kafka.common.network.Selector.doConnect(Selector.java:277) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.common.network.Selector.connect(Selector.java:255) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:990) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:301) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.tryConnect(ConsumerNetworkClient.java:575) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:854) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:830) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:206) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:169) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:129) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:602) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:412) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:246) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.coordinatorUnknownAndUnready(ConsumerCoordinator.java:460) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:488) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1262) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1231) ~[kafka-clients-3.1.2.jar:?]
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1211) ~[kafka-clients-3.1.2.jar:?]
    at sun.reflect.GeneratedMethodAccessor385.invoke(Unknown Source) ~[?:?]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_402]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_402]
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) ~[spring-aop-5.3.27.jar:5.3.27]
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) ~[spring-aop-5.3.27.jar:5.3.27]
    at com.sun.proxy.$Proxy511.poll(Unknown Source) ~[?:?]
    at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollConsumer(KafkaMessageListenerContainer.java:1601) ~[spring-kafka-2.9.13.jar:2.9.13]
    at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doPoll(KafkaMessageListenerContainer.java:1576) ~[spring-kafka-2.9.13.jar:2.9.13]
    at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1377) ~[spring-kafka-2.9.13.jar:2.9.13]
    at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1291) ~[spring-kafka-2.9.13.jar:2.9.13]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_402]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_402]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402]

It seems that according to my comparison, the connect0 method includes an isIPv6Available method. This method always returns false in NAT or bridged network modes but returns true in mirrored mode. This could be the cause of the error, especially considering that your machine and the Kafka server are only connected via IPv4 network.

Diagnostic Logs

WslLogs-2024-04-11_15-55-32.zip

github-actions[bot] commented 2 months ago

View similar issues

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!

Open similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

Diagnostic information
.wslconfig found
Detected appx version: 2.2.1.0
Unexpected format in optional-component.txt: State       : DisabledWithPayloadRemoved

JustLookAtNow commented 2 months ago

update: At the beginning, the connection to the Kafka server was normal. However, once the connection count reached 299, the error 'Cannot assign requested address' started to occur.

JustLookAtNow commented 2 months ago

update 2: I found the cause of the problem. It's because the connection port is specified only from 60500 to 60800 in /proc/sys/net/ipv4/ip_local_port_range, with only 300 connections allowed! Naturally, Kafka throws an error when it reaches this connection limit. However, when I attempted to modify net.ipv4.ip_local_port_range, I found that after any changes, no TCP connections could be created. So, how can I increase the client connection limit?

JustLookAtNow commented 2 months ago

update 2: I found the cause of the problem. It's because the connection port is specified only from 60500 to 60800 in /proc/sys/net/ipv4/ip_local_port_range, with only 300 connections allowed! Naturally, Kafka throws an error when it reaches this connection limit. However, when I attempted to modify net.ipv4.ip_local_port_range, I found that after any changes, no TCP connections could be created. So, how can I increase the client connection limit?

@OneBlue "Is there any WSL configuration that can change this parameter?"

dickens7 commented 2 months ago

I had the same problem Setting the firewall to false is normal, so guess what the firewall rules should be causing it

[experimental]
firewall=false
JustLookAtNow commented 2 months ago

I had the same problem Setting the firewall to false is normal, so guess what the firewall rules should be causing it

[experimental]
firewall=false

it's do nothing

chanpreetdhanjal commented 2 months ago

Hi. Can you please collect networking logs by following the instructions below? https://github.com/microsoft/WSL/blob/master/CONTRIBUTING.md#collect-wsl-logs-for-networking-issues

JustLookAtNow commented 2 months ago

Hi. Can you please collect networking logs by following the instructions below? https://github.com/microsoft/WSL/blob/master/CONTRIBUTING.md#collect-wsl-logs-for-networking-issues

Please check my previous response. I have identified the cause of the issue. It appears that in /proc/sys/net/ipv4/ip_local_port_range, there are only 300 ports available, ranging from 60500 to 60800. This limitation results in an error "Cannot assign requested address" when attempting to create more than 300 connections to the same IP and port. Additionally, in bridge mode, the range is expanded to 32768 to 60999, providing over thirty thousand ports. If you still require network logs, please let me know, and I will collect them for you.

chanpreetdhanjal commented 2 months ago

Yes we will need logs to further assist you. thanks

JustLookAtNow commented 1 month ago

Yes we will need logs to further assist you. thanks

/emailed-logs It is too big, I have email it to you.

keith-horton commented 1 month ago

Hi there. I see 2 bind requests that showed up about 1 minute before the end of the trace - one for ::0 port 62698, one for ::0 port 6400. both were successful. was this run native on the root, or within a container that something created within Linux?

JustLookAtNow commented 1 month ago

Hi there. I see 2 bind requests that showed up about 1 minute before the end of the trace - one for ::0 port 62698, one for ::0 port 6400. both were successful. was this run native on the root, or within a container that something created within Linux?

Both of these ports are being listened on by a Java program running in a Linux environment.My current issue isn't with listening ports, but rather, as a network client, I'm running out of available client ports. Currently, it seems there are only 300 available ports in the /proc/sys/net/ipv4/ip_local_port_range file. This results in a 'Cannot assign requested address' error when I try to establish more than 300 socket connections to the server.

keith-horton commented 1 month ago

Oh, thank you for clarifying. We can definitely make the number of ephemeral ports reserved for the Linux container configurable. I'll work on that right now.

erSitzt commented 1 month ago

@keith-horton Is there any workaround for this while this is not configurable ? Just setting the port range like on any other linux does not help.. or communication outside the initial range is blocked ? When increasing the range i get i/o timouts talking to my dns

read udp 192.168.1.xxx:60895->192.168.1.1:53: i/o timeout

with 60895 being outside the default range of

❯ cat /proc/sys/net/ipv4/ip_local_port_range                                                                                                                                                                                                
60500   60800

im in mirrored network mode by the way...

And another note... most tools i have problem with are GO applications kubectl / kapp (carvel kapp) and medusa ( a tool to import export hashicorp vault secrets ) not sure if those tend to make excessive use of connections using up the local port range ???

erSitzt commented 4 weeks ago

GO applications seem to be prone for this problem because many tools do not set Transport.MaxConnsPerHost when using net/http, which defaults to unlimited.

keith-horton commented 4 weeks ago

There's not an immediate work around, but we have a fix ready. Is there a target # for the ephemeral range needed in your scenarios?

Thanks!

erSitzt commented 4 weeks ago

@keith-horton its hard to guess as i cant really verify the connection count and im not sure how exactly the next usable free port is allocated 🤷

If there is a fix coming, it would be nice to:

For me this problem occurs in two sceanrios:

The first is not really a problem, but the second leads to a point where WSL will always hit the limit, forcing me or WSL users in general to switch to a linux vm for some commands.