skyjake / lagrange

A Beautiful Gemini Client
https://gmi.skyjake.fi/lagrange/
BSD 2-Clause "Simplified" License
1.17k stars 62 forks source link

Add support for compression (gzip/deflate/zstd/brotli) #597

Open niutech opened 1 year ago

niutech commented 1 year ago

Some users have very slow connection and some Gemini pages could weight many KB, e.g. Wikipedia articles (gemini://vault.transjovian.org/en). Please implement optional in-transfer data compression (gzip/deflate/zstd/brotli etc.), using MIME type application/gzip or a parameter e.g. text/gemini; encoding=gzip.

acidus99 commented 1 year ago

How would Lagrange do this?

Lagrange is a gemini client, not a gemini server. The only thing a Gemini client sends in a request is a fully qualified URL. It isn't sending a MIME type. The only MIME type that appears in a Gemini request/response is sent by the server, and only on 20 success responses.

A server could decide to send a compressed response to a client and use a application/gzip MIME type, but it has no way of knowing if the client can understand it, if the client will display it, if it will just offer to saved to disk as a .gz file, etc. So there is really nothing here for a client to do. Decompressing all application/gzip responses automatically is probably a bad idea.

FWIW, encoding= is not a defined MIME type parameter per the IANA. However, +gzip IS a valid suffix according to the IANA MIME registry. +xml is probably the most widely known/used suffix. Theoretically, a server could send a MIME type of text/gemini+gzip. I have no idea what clients would do with that. I imagine most would not render the gemtext and instead prompt the user to download it.

niutech commented 1 year ago

In order to add compression without altering the Gemini spec, I am proposing the following solution:

  1. If compression is enabled in settings, Lagrange appends a special query parameter (which could be hidden in address bar) e.g. ?__gemini_encoding=gzip (or &__gemini_encoding=gzip if there is existing query string in the URL), telling the server that the user agent supports gzip encoding,
  2. If the server supports gzip compression, it sends back the compressed response with a special MIME type. It may be text/gemini+gzip or text/gemini; encoding=gzip since RFC 2045 doesn't mandate a list of legal parameter names, but only the following ABNF:
     parameter := attribute "=" value
     attribute := token ; Matching of attributes is ALWAYS case-insensitive.
     value := token / quoted-string
     token := 1*<any (US-ASCII) CHAR except SPACE, CTLs, or tspecials>
     tspecials :=  "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" / "/" / "[" / "]" / "?" / "=" ; Must be in quoted-string, to use within parameter values
  3. If the server doesn't support compression, it sends back the uncompressed response with text/gemini MIME type.

I don't see a reason not to extend the parameters name, like it is with e.g. MIME type: multipart/byteranges; boundary=xxx. But text/gemini+gzip should also be fine. If __gemini_encoding query parameter is too long, it could be shortened to __genc or similar. This will provide backwards compatibility with user agents not supporting data compression.

skyjake commented 1 year ago

I don't see a problem with adding support for text/gemini+gzip.

Suggestion 1) of adding query parameters would have to be included in the Gemini protocol specification to be acceptable. That seems unlikely to happen.

In practice, a server could simply serve text/gemini+gzip pages from some subdirectory/subdomain, so the user can manually navigate there when they want to:

domain.tld/pages/ → text/gemini gzip.domain.tld/pages/ → text/gemini+gzip domain.tld/gzip/pages → text/gemini+gzip

The client wouldn't have to know anything about which URLs serve zipped files, it would entirely up to the human to know that.

This means a server would likely provide both the uncompressed and compressed versions of a page, to support all clients, and would have to do some on-the-fly URL path rewriting to ensure that the gzipped versions point to other gzipped pages.

niutech commented 1 year ago

Suggestion 1) of adding query parameters would have to be included in the Gemini protocol specification to be acceptable.

Why? It does not change the spec, it's just a convention which user agents such as Lagrange (and possibly others too) could add on top of Gemini spec. If the server doesn't support it, it will just ignore it.

In practice, a server could simply serve text/gemini+gzip pages from some subdirectory/subdomain, so the user can manually navigate there when they want to

This is too cumbersome for both publishers and users, publishers would have to rewrite URLs and users remember which servers have gzip support and what's the URL structure. Adding query params will leave it for the server to negotiate encoding under the hood.

skyjake commented 1 year ago

@niutech

Why? It does not change the spec, it's just a convention which user agents such as Lagrange (and possibly others too) could add on top of Gemini spec. If the server doesn't support it, it will just ignore it.

Anything that relates to how a client and a server communicate with each other has to be documented in the protocol specification. This is the entire point of a protocol specification: to make it unambiguous how any client can talk to any server and have them understand each other.

Of course, you are free to create a custom client for yourself that does not follow the specification, but I can't support that kind of features in Lagrange.

niutech commented 9 months ago

@skyjake A Reddit user @AntiAmericanismBrit proposed a solution to just enable TLS compression (RFC 3749) instead of juggling with URLs and MIME types. It could be added in OpenSSL using COMP_zlib() function and it would fall back to no compression if it was not supported by the server.

As for security, he wrote:

No, the BREACH and CRIME attacks are not relevant to Gemini. CRIME requires TLS compression plus the ability to inject chosen plaintext into the victim's requests, via cross-site scripting or cookies. Gemini doesn't have scripting or cookies, therefore Gemini is not vulnerable to CRIME even if TLS compression is enabled. And BREACH is a category of attacks that exploits HTTP responses with HTTP compression (not TLS compression), and again it relies on cookies to work. Gemini has no cookies, therefore Gemini is not vulnerable. These attacks cannot be used to retrieve Gemini client-side certificates, which is the only login mechanism we use. So Gemini is safe from these attacks.

Is it possible for you to add it to Lagrange?

skyjake commented 9 months ago

That is an interesting suggestion, however the OpenSSL manual pages say this:

The TLS RFC does however not specify compression methods or their corresponding identifiers, so there is currently no compatible way to integrate compression with unknown peers. It is therefore currently not recommended to integrate compression into applications. Applications for non-public use may agree on certain compression methods.

This would mean that the server and the client must agree on which compression method is being used, and the only way to do that is to specify this in the Gemini protocol specification.

In my opinion, compression at the protocol level is one of those non-essential extra features that have been omitted from Gemini on purpose, to keep things simpler.

niutech commented 9 months ago

The linked RFC 3749 explicitly defines the DEFLATE compression method with its ID 1. In my opinion, the optional compression (gracefully falling back to no compression) is still better than no compression at all.

niutech commented 9 months ago

More info from the OpenSSL manual:

An OpenSSL client speaking a protocol that allows compression (SSLv3, TLSv1) will unconditionally send the list of all compression methods enabled with SSL_COMP_add_compression_method() to the server during the handshake. (...) An OpenSSL server will match the identifiers listed by a client against its own compression methods and will unconditionally activate compression when a matching identifier is found. (...)

So the compression method used does not matter, OpenSSL negotiates it automatically itself, no need to change the Gemini spec. TLS is one level lower than Gemini in the OSI model.