python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.21k stars 344 forks source link

Supporting "Windows Subsystem for Linux" in Trio #893

Open njsmith opened 5 years ago

njsmith commented 5 years ago

Windows now has this "WSL" thing, that lets you (allegedly) run unmodified Linux apps on Windows. Since it's a from-scratch reimplementation of the Linux APIs, it doesn't actually work exactly like Linux; it's effectively a new platform.

So far supporting WSL hasn't been a high priority, and we don't currently test on it or anything. I don't know how high a priority it should be. But all else being equal, it certainly would be nicer to support it than not. And @dd-dent and @codypiersall were experimenting with it today in gitter, and I guess more people will probably experiment in the future, so let's have an issue to track the status :-).

The first discovery was that non-blocking socketpair() sockets are totally busted on Windows 1803: https://github.com/Microsoft/WSL/issues/3100

Trio currently uses non-blocking socketpair() sockets internally, and they're used extensively in the test suite, so that's kind of a non-starter right now. Fortunately this is apparently fixed in Windows 1809.

Cody upgraded to 1809 and tried again, reporting:

The test suite made it much further after upgrading Windows. Still some failures: trio/tests/test_highlevel_open_tcp_listeners.py::test_open_tcp_listeners_rebind FAILED but IIRC SO_REUSEADDR behavior differs on Windows from Unix, and maybe WSL confroms to the Windows behavior instead of Linux (which is different from BSD too, I think??? Maybe not...)

The bigger issue is that the test trio/tests/test_highlevel_socket.py::test_SocketStream_send_all hangs. It's hanging at the recevier() line await wait_all_tasks_blocked() and the sender is stuck at trio/_highlevel_socket.py, the first call to self.socket.send(remaining) in the send_all() loop. I guess the issue is that await wait_all_tasks_blocked is not working for some reason, but I haven't the slightest clue why. Seems like that wouldn't be platform-specific. I also could definitely be misdiagnosing the problem.

So... promising progress, but still some issues that haven't been fully analyzed.

If WSL was going to become an "officially supported" platform for trio, then eventually we'll need to figure out how to run CI on it. This issue has some info on a possible way to run WSL on azure pipelines for testing: https://github.com/Microsoft/azure-pipelines-image-generation/issues/478. I haven't tried it, no idea if it works, and currently the newest windows version available on azure pipelines is 1803, so probably we would need to wait for 1809 to be rolled out anyway. (Plus we know trio doesn't work on WSL right now anyway, so there's not much point in worrying about CI until that's fixed.)

codypiersall commented 5 years ago

Thanks for making this issue! After thinking about it a bit this evening, it seems likelier that the socket itself was blocking and that wait_all_tasks_blocked worked as expected. I'll check that out in the next couple days hopefully.

codypiersall commented 5 years ago

A few quick notes:

  1. In Windows box without the October 2018 update, the tests hang on trio/_core/tests/test_epoll.py::test_epoll_statistics. This was running Build 1803. Upgrading to the October 2018 update made that test pass.
  2. The following one-liner does not work as expected on WSL, and I don't think there's anything Trio can do about this:

    python3 -c "import socket; a, b = socket.socketpair(); a.setblocking(False); a.send(b'x' int(1e6)); a.send(b'x' int(1e6))"

    This test should raise a BlockingIOError, but it does not on WSL.

  3. It seems like this is related to https://github.com/Microsoft/WSL/issues/3100 (discovered by @sorcio), but that was supposed to be fixed in 1809, which I'm running now.

Anyway, this is probably a bug in WSL, or my environment is playing a trick on me.

codypiersall commented 5 years ago

So this is interesting and fun...

It seems that WSL disregards a socket's blocking/nonblocking nature based on the size of the data to be sent, which is why some tests were passing and some were hanging. The following script illustrates the issue:

#!/usr/bin/env python3

import socket
import sys

size = int(sys.argv[1])
a, b = socket.socketpair()

a.setblocking(False)

while True:
    a.send(b'x' * size)

Call the script like this in a WSL console:

python socketfail.py $((10 * 2**14 - 1))   # works as expected!!!
python socketfail.py $((10 * 2**14))   # Hangs eternally!!!

So for some reason the number 163840 (aka 10 * 1**14) is a problem in WSL.

dd-dent commented 5 years ago

On another note, the 1809 update (also known as the "Windows 10 October Update") has already gained notoriety for doing bad things, has been pulled back, and recently resumed a limited, optional rollout.
From the Microsoft blog:

We are also streamlining the ability for users who seek to manually check for updates by limiting this to devices with no known key blocking issues, based on our ML model.

Apparently I belong to that select group of users that failed the "ML model" filtering, and I'll assume I'm not the only one, so not everyone will be able to upgrade to that version.
I also read in some other blog post (which I can't seem to find now) that the update might never become available to the general public prior to the release of the next one (1903).

codypiersall commented 5 years ago

I opted in to the update, but that's only because I love BSODs.

dd-dent commented 5 years ago

I'm getting those without updating, so at least I'm not missing out on that action.
Just a bit insulting to be weeded out like that by an AI... Makes me feel inadequate...

codypiersall commented 5 years ago

I opened https://github.com/Microsoft/WSL/issues/3813 with WSL to track this on their end.