python-websockets / websockets

Library for building WebSocket servers and clients in Python
https://websockets.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
5.16k stars 513 forks source link

WebSockets and HTTP sharing the same port using request headers #116

Closed pshaughn closed 7 years ago

pshaughn commented 8 years ago

RFC 6455 doesn't mandate that a WebSocket server implementation be able to share the same port with HTTP, but the protocol was designed to allow the possibility.

Currently, WebSocketServerProtocol.handshake doesn't test for the "Upgrade: websocket" header. This doesn't cause any problems because only a WebSocket client would be able to finish the handshake process, but it does appear to rule out port sharing.

It would be convenient in some cases to be able to pass something like an HTTPServer into the WebSocketServer constructor and have requests without the "Upgrade: websocket" header be forwarded to it.

I realize that just putting WebSocket traffic on its own port solves the same problem more easily in most cases, so this idea probably isn't worth implementing if it would take a lot of effort.

aaugustin commented 8 years ago

Indeed, I didn't want to implement a HTTP server, only a websocket server (unlike e.g. aiohttp).

You can /probably/ inherit or wrap WebSocketServerProtocol into something that handles HTTP. I made an effort in the design of the API not to rule out this possibility immediately, but I never tried it, so it's entirely possible that it doesn't work...

If you try it, I'm interested into hearing about the results :-) However I suspect it will always be easier to run on a different IP or port.

ghostplant commented 8 years ago

I also need this feature, because it will make deployment more easy when been packaged by docker and other engines. Sharing the same port will make it easy to design the port forwarding : )

ghostplant commented 8 years ago

I just implement web ssh (https://github.com/ghostplant/websocket-myssh-tunnel.git) using this websockets library, but listening on two ports (8022/8023) seems to be ugly. I am finding a solution.

I think supporting static html files based on GET method (like a SimpleHTTPServer do) will be enough because javascript in html will be able to communicate with the websocket server!

aaugustin commented 8 years ago

I believe the Docker best practice is to run different services in different containers, so you shouldn't serve static contents and websockets from the same container?

Supporting static file isn't enough, someone will ask for dynamic stuff shortly thereafter...

pshaughn commented 8 years ago

In your case, ghostplant, it sounds like you can use a regular web-server with server-side proxying, and have it proxy requests on a specific path into the websocket port. Since you're already using Docker, you can easily set up nginx to do that.

ghostplant commented 8 years ago

Firstly, Python Flask / Java Tomcat based websocket are designed to share http port and websocket port.

Secondly, if I seperate them two into different docker, when it comes to deploy multiple group of these services, it will be a little more complex, because javascript (from html client side) has to write a complex proceduce to calculate the corrent port of proxy requests to the websocket server.

ghostplant commented 8 years ago

To some extent, HTTP Request and WebSocket Request have same format so they can be regarded as one service.

pshaughn commented 8 years ago

I like that this is just a pure-Python websockets program with no framework dependencies, and I wouldn't want that to change. For your port-combining case, ghostplant, you can put nginx and this in the same Docker container, expose just the nginx port of that container, and set up nginx as a reverse proxy forwarding one path to the non-exposed websocket port and serving the other path itself, thus getting a container with one visible port serving both websocket and http requests.

aaugustin commented 8 years ago

I have to say that I don't aim at competing with Tomcat ;-) More seriously, websockets isn't designed to support HTTP(S). The scope of this issue is to make it easier to integrate with a third-party server.

ghostplant commented 8 years ago

So can you add a hook to allow users to implement handling HTTP event by themselves? Current "handshake" method in server.py doesn't satisfy idempotency so if users inherit that Protocol, they have to re-implement most of the codes.

I suggest two ways to support this hook in server.py, approximately adding 2-8 lines. Can I suggest the PR?

aaugustin commented 8 years ago

Of course you can submit a PR, that's the best way to describe the approach you're proposing.

tgwizard commented 8 years ago

I hacked together a simple subclass of WebSocketServerProtocol that can also serve regular HTTP requests. Something like this:

class HttpWSSProtocol(websockets.WebSocketServerProtocol):
    async def handler(self):
        try:
            request_line, headers = await websockets.http.read_message(self.reader)
            method, path, version = request_line[:-2].decode().split(None, 2)
        except:
            self.writer.close()
            self.ws_server.unregister(self)
            raise

        # TODO: Check headers etc. to see if we are to upgrade to WS.
        if path == '/ws':
            # HACK: Put the read data back, to continue with normal WS handling.
            self.reader.feed_data(bytes(request_line))
            self.reader.feed_data(headers.as_bytes().replace(b'\n', b'\r\n'))
            return await super(HttpWSSProtocol, self).handler()
        else:
            try:
                return self.http_handler(method, path, version)
            finally:
                self.writer.close()
                self.ws_server.unregister(self)

    def http_handler(self, method, path, version):
        response = '\r\n'.join([
            'HTTP/1.1 404 Not Found',
            'Content-Type: text/plain',
            '',
            '404 Not Found\n%s %s.' % (method, path),
        ])
        self.writer.write(response.encode())

async def ws_handler(websocket, path):
    pass

start_server = websockets.serve(ws_handler, '0.0.0.0', port, klass=HttpWSSProtocol)

It's unfortunate that we'd need to read the request headers twice for the WS case, and I guess there's where a PR could simplify a bit. Full working (but very hacky) code in https://github.com/tgwizard/touch-down

Badg commented 7 years ago

It's probably worth mentioning that load balancer health checks (on, for example, AWS, which now supports Websockets natively) are often http-only. Right now an easy workaround is to just use a second port and http.server for a trivial health check (respond to everything with 200), but it would be nice to be able to actually check the health of the websockets server itself, since that's ultimately what matters.

AWS limits successful response codes for health checks to be in the 200-399 (inclusive) range, so unfortunately you can't use the 101 response to pass the health check.

aaugustin commented 7 years ago

On that topic, other people have had issues with load balancer health checks that just opened a TCP connection. This worked but filled the logs with errors.

aaugustin commented 7 years ago

One solution would be a hook just before key = check_request(get_header) in WebSocketServerProtocol, that could be implemented in subclasses that don't want to proceed with the WebSocket handshake. I'm not very enthusiastic about that.

cjerdonek commented 7 years ago

In my own work, I'm finding there are other use cases where a hook for normal HTTP requests seems to make sense.

One is if you want to expose an API on the server for callers to query something about the state of the websockets process. Another is if you want the server to be able to receive data out-of-band via HTTP. In both cases, it seems like an unnecessary step for the caller to have to "wrap" such one-off API calls in a websocket connection (and similarly, on the server-side, for the server to have to process it as a websocket connection).

In terms of where such a hook could go, it seems like it might make more sense to go in WebSocketServerProtocol.handler() before the call to handshake(). This is because, semantically, it seems to make more sense to call handshake() only if you actually want a websocket connection.

There is a second benefit to putting a hook in handler() before handshake(). It relates to a comment I made here about handling 404's.

Currently, to return a 404, it seems like the only way is to override get_response_status(). But get_response_status() happens relatively late in the process (e.g. after calling check_request()). In the usual case where a 404 depends only on the path, it seems like there should be a way to return a 404 before the handshake even begins. This could be done in the same place as the (non-ws) HTTP hook I'm suggesting above (i.e. before handshake() in handler()). This would let 404-handling be on the "fast path" (where it probably should be).

Putting a hook where I'm suggesting would require some minor refactoring like calling read_http_request() in handler() before handshake() instead of inside handshake().

aaugustin commented 7 years ago

Yes, I think a hook on the server-side just before handshake would be appropriate.

Perhaps that hook should replace get_response_status, perhaps not.

I may have strong opinions about the API we choose :-)

cjerdonek commented 7 years ago

The way I did it was to add a pre_handshake() method containing the boilerplate, but the hook is still get_response_status() (called by pre_handshake()).

cjerdonek commented 7 years ago

Just to clarify, the patch I posted in PR #202 keeps the API exactly the same as before. All it changes is when get_response_status() is called, namely before handshake() instead of during.

The patch also moves the bits in handshake() that aren't specific to websockets to be called before handshake(). This way get_response_status() can access the same general attributes as before like path, origin, etc. Moving these bits has the additional benefit of simplifying handshake() and making it responsible only for the websocket-specific logic.

I will finish adding a couple tests to PR #202. Documentation changes should I think wait until after PR #199 is committed, because the changes would be on top of that (and appear to be minimal, from what I can tell, since the patch doesn't change the API). Thanks!

[Edit: correction, there is a slight change to the API in the PR, namely with what get_response_status() should return.]

aaugustin commented 7 years ago

Given the very large overlap between the goals of get_response_status() and the extension point discussed here, I think it makes sense to change get_response_status() into a more generic API, which I drafted in #248 with a new API: process_request(). Dropping the get_response_status() keeps things simple.

process_request() only has access to path (== self.path) and request_headers (== self.request_headers). It may also poke into self.raw_request_headers if needed. That's sufficient to access origin with self.request_headers['Origin'] if needed.

I agree with postponing the documentation changes to #199, after we make a decision on the API.

aaugustin commented 7 years ago

Everyone who contributed to this issue: thanks for you inputs; can you comment on #248 to tell me if it solves your use case? Thanks!