simonw / datasette

An open source multi-tool for exploring and publishing data
https://datasette.io
Apache License 2.0
9.45k stars 676 forks source link

gzip support for HTML (and JSON) responses #1213

Open simonw opened 3 years ago

simonw commented 3 years ago

This page https://datasette-tiles-demo.datasette.io/San_Francisco/tiles is 2MB because of all of the base64 images. Gzipped it's 1.5MB.

Since Datasette is usually deployed without a frontend gzipping proxy, Datasette itself needs to solve for this.

Gzipping everything won't work because some endpoints - the all-rows CSV endpoint and the download-database endpoint - are streaming and hence can't be buffered-and-gzipped.

simonw commented 3 years ago

Starlette's gzip middleware implementation is here: https://github.com/encode/starlette/blob/0.14.2/starlette/middleware/gzip.py

simonw commented 3 years ago

Starlette accumulates the full response body in a body variable and then does this:

        elif message_type == "http.response.body":
            # Remaining body in streaming GZip response.
            body = message.get("body", b"")
            more_body = message.get("more_body", False)

            self.gzip_file.write(body)
            if not more_body:
                self.gzip_file.close()

            message["body"] = self.gzip_buffer.getvalue()
            self.gzip_buffer.seek(0)
            self.gzip_buffer.truncate()

            await self.send(message)
simonw commented 3 years ago

So maybe I could a special response header which ASGI middleware can pick up that means "Don't attempt to gzip this, just stream it through".