microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.26k stars 812 forks source link

Port forwarding repeated failure on WSL 1.1.0 #9508

Closed rudyzeinoun closed 1 year ago

rudyzeinoun commented 1 year ago

Version

Microsoft Windows [Version 10.0.22623.1095]

WSL Version

Kernel Version

5.15.83.1

Distro Version

Ubuntu 20.04

Other Software

Apache/2.4.41 mysqld Ver 10.3.37-MariaDB-0ubuntu0.20.04.1 PHP 8.1 + php8.1fpm systemd enabled in /etc/wsl.conf

Repro Steps

With WSL 1.1.0 (recently pushed to Store although marked as pre-release), port forwarding fails repeatedly. Start Apache on the Ubuntu distro, and from Command Prompt on Windows, try to: telnet localhost 80 It will work. A few seconds later, repeat the telnet command and it will fail. Port forwarding no longer works to connect to WSL Ubuntu's running services.

Expected Behavior

telnet command should keep working on the port.

Actual Behavior

telnet command will timeout.

On first try: netstat -an | findstr /c:"80" | findstr /c:"LISTENING" Shows port 80 as Listening.

After a few seconds, repeat the netstat command, the port is no longer listed. This applies to any service running on WSL, and not just Apache. Port forwarding fails after a few seconds of the service going up.

Restart apache "service apache2 restart". The port will appear on netstat. Wait 10 seconds and check again. It disappears.

Diagnostic Logs

No response

realmrv commented 1 year ago

I see a new release is available on the Microsoft Store (1.1.2) and it looks like the bug is fixed.

I confirm. I don't see this problem after the update.

Dnouv commented 1 year ago

It is resolved in the new release 💯 . Just curious if it is ok to tell the community what the issue was, @pmartincic. 🤔

Thank you for the fix.

stafyniaksacha commented 1 year ago

1.1.2 seem to solve the issue!

nidrissi commented 1 year ago

There is an issue in LaTeX-Workshop (LW) that was believed to come from this bug: https://github.com/James-Yu/LaTeX-Workshop/issues/3670 However, even though I'm on 1.1.2, the bug remains.

WSL version ``` $ wsl --version Version WSL : 1.1.2.0 Version du noyau : 5.15.83.1 Version WSLg : 1.0.49 Version MSRDC : 1.2.3770 Version direct3D : 1.608.2-61064218 Version de DXCore : 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp version Windows : 10.0.22623.1245 ```

LW launches a web server that listens on 127.0.0.1:PORT. Here's the output from the linux side:

$ ss -tlnp
State                Recv-Q                Send-Q                               Local Address:Port                                Peer Address:Port               Process
LISTEN               0                     511                                      127.0.0.1:39177                                    0.0.0.0:*                   users:(("node",pid=1677,fd=25))

And the webserver can be accessed from the linux side too:

$ curl --head 'http://127.0.0.1:39177/viewer.html?file=pdf..ZmlsZSUzQSUyRiUyRiUyRmhvbWUlMkZuYWppYiUyRnRtcCUyRm1haW4ucGRm'
HTTP/1.1 200 OK
[omitted for brevity]

However, the connection is refused from the Windows host:

> curl --head 'http://127.0.0.1:39177/viewer.html?file=pdf..ZmlsZSUzQSUyRiUyRiUyRmhvbWUlMkZuYWppYiUyRnRtcCUyRm1haW4ucGRm'
curl: (7) Failed to connect to 127.0.0.1 port 39177 after 2043 ms: Connection refused

And netstat -ano | findstr 39177 comes up empty.

I don't know if it's the same bug, but it is definitely a bug: something listens on the linux side, but the port doesn't get forwarded from the Windows host. Could it come from the same issue? In case that matters, I'm running openSUSE:

/etc/os-release ``` NAME="openSUSE Tumbleweed" # VERSION="20230131" ID="opensuse-tumbleweed" ID_LIKE="opensuse suse" VERSION_ID="20230131" PRETTY_NAME="openSUSE Tumbleweed" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:opensuse:tumbleweed:20230131" BUG_REPORT_URL="https://bugs.opensuse.org" HOME_URL="https://www.opensuse.org/" DOCUMENTATION_URL="https://en.opensuse.org/Portal:Tumbleweed" LOGO="distributor-logo-Tumbleweed" ```
ghost commented 1 year ago

@nidrissi, can you give me logs? I'll try to reproduce this but might need some help from you.

So, I'm happy to hear that this seems to solve it for most people. That said, we're still working on resolving one more issue that I'm aware of. 1.1.2 Has preliminary fixes but you could still notice funny behavior if you're addressing via localhost:port on the host instead of 127.0.0.1:port or --1.ipv6-literal.net:port; That is still being worked on.

1.1.0 Contained an attempt at making bind calls synchronous across the guest and host. The old relay did not bind synchronously across the guest and host.

bplasmeijer commented 1 year ago

Can someone please confirm if there's any potential for loss of data when attempting the downgrade back to 1.0.3 from a previous wsl --update?

$Package = Get-AppxPackage MicrosoftCorporationII.WindowsSubsystemforLinux -AllUsers 
Remove-AppxPackage $Package -AllUsers
Add-AppxPackage .\Microsoft.WSL_1.0.3.0_x64_ARM64.msixbundle

The Remove-AppxPackage command looks particularly scary since I'm not sure if the WSL mount data of individual Distribution packages is somehow coupled with the WSL package (it doesn't appear so, but better safe than sorry!).

not on my revert

zed76r commented 1 year ago

image

kubernetes still not work on 1.1.2, it's work on 1.0.3.

image

image

ghost commented 1 year ago

@ZedG2, can you give me logs please? Logs starting from the time where you launch Kubernetes, to the time where you attempt to use it?

elsaco commented 1 year ago

@ZedG2 what is the output of kubectl cluster-info? Here's sample output on my test vm:

elsaco@RIPPER:~$ kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:16443
CoreDNS is running at https://127.0.0.1:16443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

and one nginx pod:

elsaco@RIPPER:~$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
nginx-748c667d99-cpmf9   1/1     Running   0          9m10s
nginx pod full info ``` { "apiVersion": "v1", "items": [ { "apiVersion": "v1", "kind": "Pod", "metadata": { "annotations": { "cni.projectcalico.org/containerID": "2e926c65ce1480c0a696fcef22823a4a232d9a41538e89b8d0af1811f59b90b1", "cni.projectcalico.org/podIP": "10.1.120.199/32", "cni.projectcalico.org/podIPs": "10.1.120.199/32" }, "creationTimestamp": "2023-02-03T03:06:57Z", "generateName": "nginx-748c667d99-", "labels": { "app": "nginx", "pod-template-hash": "748c667d99" }, "name": "nginx-748c667d99-cpmf9", "namespace": "default", "ownerReferences": [ { "apiVersion": "apps/v1", "blockOwnerDeletion": true, "controller": true, "kind": "ReplicaSet", "name": "nginx-748c667d99", "uid": "4fa5cd3d-bed2-42fb-b619-df437541070f" } ], "resourceVersion": "4539", "uid": "ee268cd8-ea68-4d01-b9a1-9c338edaf8ff" }, "spec": { "containers": [ { "image": "nginx", "imagePullPolicy": "Always", "name": "nginx", "resources": {}, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [ { "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", "name": "kube-api-access-z96qv", "readOnly": true } ] } ], "dnsPolicy": "ClusterFirst", "enableServiceLinks": true, "nodeName": "ripper", "preemptionPolicy": "PreemptLowerPriority", "priority": 0, "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {}, "serviceAccount": "default", "serviceAccountName": "default", "terminationGracePeriodSeconds": 30, "tolerations": [ { "effect": "NoExecute", "key": "node.kubernetes.io/not-ready", "operator": "Exists", "tolerationSeconds": 300 }, { "effect": "NoExecute", "key": "node.kubernetes.io/unreachable", "operator": "Exists", "tolerationSeconds": 300 } ], "volumes": [ { "name": "kube-api-access-z96qv", "projected": { "defaultMode": 420, "sources": [ { "serviceAccountToken": { "expirationSeconds": 3607, "path": "token" } }, { "configMap": { "items": [ { "key": "ca.crt", "path": "ca.crt" } ], "name": "kube-root-ca.crt" } }, { "downwardAPI": { "items": [ { "fieldRef": { "apiVersion": "v1", "fieldPath": "metadata.namespace" }, "path": "namespace" } ] } } ] } } ] }, "status": { "conditions": [ { "lastProbeTime": null, "lastTransitionTime": "2023-02-03T03:06:58Z", "status": "True", "type": "Initialized" }, { "lastProbeTime": null, "lastTransitionTime": "2023-02-03T03:07:04Z", "status": "True", "type": "Ready" }, { "lastProbeTime": null, "lastTransitionTime": "2023-02-03T03:07:04Z", "status": "True", "type": "ContainersReady" }, { "lastProbeTime": null, "lastTransitionTime": "2023-02-03T03:06:57Z", "status": "True", "type": "PodScheduled" } ], "containerStatuses": [ { "containerID": "containerd://e69718306e91cc7f6ab6c3738b2246239c94e8ce8d4ae4cb072f254da2997c96", "image": "docker.io/library/nginx:latest", "imageID": "docker.io/library/nginx@sha256:b8f2383a95879e1ae064940d9a200f67a6c79e710ed82ac42263397367e7cc4e", "lastState": {}, "name": "nginx", "ready": true, "restartCount": 0, "started": true, "state": { "running": { "startedAt": "2023-02-03T03:07:03Z" } } } ], "hostIP": "192.168.95.205", "phase": "Running", "podIP": "10.1.120.199", "podIPs": [ { "ip": "10.1.120.199" } ], "qosClass": "BestEffort", "startTime": "2023-02-03T03:06:58Z" } } ], "kind": "List", "metadata": { "resourceVersion": "" } } ```

Also, the dashboard is accessible from Windows side at https://127.0.0.1:10443/

WSL version: 1.1.2.0
Kernel version: 5.15.83.1
WSLg version: 1.0.49
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.2546
nidrissi commented 1 year ago

@nidrissi, can you give me logs? I'll try to reproduce this but might need some help from you.

@pmartincic Sure, there you go. Let me know if you need anything else. WslLogs-2023-02-03_07-55-47.zip

nidrissi commented 1 year ago

@pmartincic Here are the logs from another machine, as well as the networking logs from WSL.

zed76r commented 1 year ago

@ZedG2, can you give me logs please? Logs starting from the time where you launch Kubernetes, to the time where you attempt to use it?

~
❯ kubectl cluster-info
Kubernetes control plane is running at https://kubernetes.docker.internal:6443
CoreDNS is running at https://kubernetes.docker.internal:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

~
❯ dig kubernetes.docker.internal

; <<>> DiG 9.18.11-2-Debian <<>> kubernetes.docker.internal
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60724
;; flags: qr rd ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;kubernetes.docker.internal.    IN      A

;; ANSWER SECTION:
kubernetes.docker.internal. 0   IN      A       127.0.0.1
kubernetes.docker.internal. 0   IN      A       127.0.0.1
kubernetes.docker.internal. 0   IN      A       127.0.0.1

;; Query time: 10 msec
;; SERVER: 172.26.80.1#53(172.26.80.1) (UDP)
;; WHEN: Sat Feb 04 12:27:34 CST 2023
;; MSG SIZE  rcvd: 118
bplasmeijer commented 1 year ago

Not fixed on my machine on 1.1.2 @craigloewen-msft @benhillis Same error, access denied on the local kind cluster

Version                : 1.1.2.0

logs files, 1.1.2 WslLogs-2023-02-06_15-50-04.zip 1.0.3 WslLogs-2023-02-06_15-54-56.zip

asampal commented 1 year ago

Still having problems with 1.1.2. Easy to reproduce with the minikube (https://github.com/kubernetes/minikube) running in WSL2. Attempt to start the minikube dashboard (which will bind to 127.0.0.1:). This will open up the page in your browser, but will not show the dashboard successfully, and instead just sit there doing nothing.

elsaco commented 1 year ago

@asampal if you run minikube inside a WSL instance, check .minikube/logs/lastStart.txt in your $HOME and get more insight of what's going on.

asampal commented 1 year ago

@elsaco , downgrading to 1.0.3 resolved the issue, so for now I think I'll stay at that version, unless I see any suggestions specific to minikube either in this repo or minikube's.

JshGrn commented 1 year ago

This is a massively broken release because of this. Wasted huge amount of time because of this release.

Edit: removed comment asking where auto update was, found it.

pedrolamas commented 1 year ago

@JshGrn disable it in the Microsoft Store app (click photo on top right, Settings, disable app updates)

Just do remember to come here ocasional to manually update the other apps...

JshGrn commented 1 year ago

@pedrolamas Yeah I saw in the end my bad, yeah I will come back in a month and hope its working again, is there any update on the timeline estimated for a fix on this?

Jont828 commented 1 year ago

Thanks for linking this thread, I'll go ahead and see if downgrading works.

EDIT: Working on my end!

OneBlue commented 1 year ago

Thanks everyone for the feedback. The fix for this issue is in WSL 1.1.3

ghost commented 1 year ago

More specifically it is now back to the behavior that was part of 1.0.3

svetoslavenchev commented 1 year ago

Thanks everyone for the feedback. The fix for this issue is in WSL 1.1.3

Just tried it - my Docker Compose projects are happy :)

nickchomey commented 1 year ago

Nice work!

How long does it typically take for a new version to arrive in the Windows Store? I just removed 1.0.3 that I installed manually and installed WSL2 from the Windows store, but I have 1.1.2.

OneBlue commented 1 year ago

Nice work!

How long does it typically take for a new version to arrive in the Windows Store? I just removed 1.0.3 that I installed manually and installed WSL2 from the Windows store, but I have 1.1.2.

It's in pre-release for now so only insiders will receive it through the store. It'll be available for everyone once we decide that it's stable enough to go for GA.

In the meantime, non-insider users can download and install the package manually like you did.

nickchomey commented 1 year ago

Thanks. I'm a Windows 11 Beta Insiders user, but it is only giving me 1.1.2. Is there any way to get 1.1.3 automatically through the store so that I can just forget about this, rather than need to monitor releases for manual updates? Also, since 1.1.2 is only a partial fix for this issue, it seems like 1.1.3 should be the one available, with 1.0.3 available for non-insiders

Cremesis commented 1 year ago

I think that 1.1.3 reintroduced the problem in my machine, where the 1.1.2 fixed it :\

ghost commented 1 year ago

@Cremesis, can you open a new bug? 1.1.3 is supposed to be a revert to the behavior present in 1.0.3, feel free to mention me on the issue.

ghost commented 1 year ago

@nickchomey, We don't push builds to everyone at once. This bug is a great example of why we have an insiders ring that we push to before going to GA. The release process isn't our favorite thing either?

nickchomey commented 1 year ago

I understand that. But my point is that you've pushed a partial fix (1.1.2) to me rather than the full fix (1.1.3). It seems to me that I should only be receiving 1.0.3 or 1.1.3...

ghost commented 1 year ago

I get where you're coming from. This is why I wasn't going to mark the bug as closed till I was sure it had rolled out. I can ask someone who knows more about the release process via the store, but I'm pretty sure it's not instantaneous like that.

nickchomey commented 1 year ago

Ok. Anyway I've installed 1.1.3 manually and am not having problems thus far. I'll follow up in a couple weeks to see what version the store has.

ghost commented 1 year ago

@Cremesis, @rajshrimohanks: If you're still experiencing an issue on 1.1.3 I'd greatly appreciate it if you opened a new bug and posted logs.

JshGrn commented 1 year ago

Having to download manually as 1.1.2 still latest in Store on windows insider preview (19/02/2023), checking if fixed now.

EDIT: Fixed for me, great.

Cremesis commented 1 year ago

@Cremesis, @rajshrimohanks: If you're still experiencing an issue on 1.1.3 I'd greatly appreciate it if you opened a new bug and posted logs.

No issue, it was an error on my part: I had a .wslconfig file that disabled ipv6 because I had found it helped with my previous issue on v1.1.0, but in v1.1.3 this was blocking access my server running in WSL... removing that file and restarting WSL solved the problem.

nickchomey commented 1 year ago

Ok. Anyway I've installed 1.1.3 manually and am not having problems thus far. I'll follow up in a couple weeks to see what version the store has.

I just installed from the windows store and received 1.1.3.0. Again, I'm Windows Beta Insider. Hope this helps some people

androiddisk commented 1 year ago

In normal use, execute docker ps or docker restart on the probability line. All forwarding ports fail. Restarting wsl doesn't work. You can only restart windows

ghost commented 1 year ago

Locking this thread because the functionality/feature in question was reverted to the old behavior. Any issues should be filed as new bugs or comments on other bugs.