rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.5k stars 261 forks source link

DCOM not working in windows #6314

Open jvnvenu opened 1 month ago

jvnvenu commented 1 month ago

Environmental Info: RKE2 Version: v1.30.1

Node(s) CPU architecture, OS, and Version: Cluster 1 3 Linux nodes - master - RHEL 8 1 windows - agent - Window server 2022 Cni: flannel Cluster 2 3 Linux nodes - master - RHEL 8 2 windows - agent - Window server 2019 Cni: flannel

Describe the bug: Our application is trying to connect a DCOM server from this windows containers. We are getting RPC unavailable error. We are using mcr.microsoft.com/dotnet/aspnet:8.0.6-windowsservercore-ltsc2022

We raised Microsoft ticket regarding this but looks like it is related to RKE2 network. Because if I change the network mode to host it is working where as in Cluster mode it is not working

Steps To Reproduce: wmic /node:**<any windows server Ip>** os get caption

Expected behavior: Caption Microsoft Windows Server 2022 Standard

Actual behavior: ERROR: Description = The RPC server is unavailable.

brandond commented 1 month ago

Is DCOM (wmic) to the host expected to work in Windows containers, when not using host network? I'm not seeing how this issue is unique to RKE2. Can Microsoft confirm that this is even supposed to be supported?

jvnvenu commented 1 month ago

With docker desktop it is working

brandond commented 1 month ago

Docker Desktop isn't Kubernetes?

jvnvenu commented 1 month ago

It supports Kubernetes but I have tried running as a container directly like docker run

jvnvenu commented 1 month ago

One more information 6 months back even RKE2 it works with 2019 but not 2022. At that time we used 1.24* version with calico CNI

jvnvenu commented 1 month ago

@brandond any update on this?

brandond commented 1 month ago

No, I don't know enough about how DCOM connects to hosts over the container network to even begin suggesting where the problem lies. Did MS have any ideas, other than pointing their finger at the CNI?

jvnvenu commented 1 month ago

@brandond Apart from DCOM I did some analysis found below network issue.

I have a application which is hosting a WCF service on a port 9500 and it is running in a windows container. If I try to access the service from another windows container. APITest is test application to access the service. Success - PodIP - Target Port - dotnet .\APITest.dll net.tcp://10.42.16.102:9500/ Success - Node IP - Node Port - dotnet .\APITest.dll net.tcp://10.36.254.81:30815/ Fail - ServiceIP - Service Port - dotnet .\APITest.dll net.tcp://10.43.217.208:9500/

To make it more simple i have used windows PowerShell script command Success - PodIP - Target Port - Test-NetConnection 10.42.16.102 -port 9500 Success - Node IP - Node Port - Test-NetConnection 10.36.254.81 -port 30815 Fail - ServiceIP - Service Port - Test-NetConnection 10.43.217.208 -port 9500

for RPC the port 135 should be accessible. So I tried the same for port 135 the result are same, through service IP it is failing.

If I do the same hosting a service in linux container and try to access from another windows container all 3 combinations succeeded.

So there must some network issue when a service is running in windows container

jvnvenu commented 1 month ago

@brandond any update on the network issue when using service IP and port

github-actions[bot] commented 1 day ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.