michael-lazar / gemini-diagnostics

A torture test for gemini servers
MIT License
25 stars 5 forks source link

Please allow redirection of root URL to / #6

Closed tslocum closed 3 years ago

tslocum commented 3 years ago

Allowing serving the root URL without redirection makes sense. Could we also allow serving it with redirection?

michael-lazar commented 3 years ago

Could we also allow serving it with redirection?

Unfortunately I'm pretty convinced now that I was completely wrong when I wrote that test.

The main problem with the root redirect is that it's reasonable to expect a client to use a URL normalization library for caching or resolving URLs like this gemini://example.com/path/subpath/.. to gemini://example.com/path/ before making their request.

Because gemini://example.com/ and gemini://example.com are canonically the same, the normalization could go either way. So if you redirect to gemini://example.com/, the client might reasonably strip off the trailing slash and make the same request, leading to an infinite redirect loop.

To give a real example, I ran into this when I was working on my mozz-archiver tool. My crawler was caching gemini responses to prevent requesting the same page twice. I hit "gemini://mozz.us" and cached the response containing the redirect. But immediately after that, the crawler normalized "gemini://mozz.us/" to "gemini://mozz.us" and marked it as already seen.

bortzmeyer commented 3 years ago

As discussed with you on the mailing list, RFC 3986, section 6.2.3 says "In general, a URI that uses the generic syntax for authority with an empty path should be normalized to a path of "/"." Note the "in general". The rest of section 6.2.3 explains that is is scheme-specific.

gemini://gemini.circumlunar.space/docs/specification.gmi seems silent about "gemini:" rules. 1.2 says "The path, query and fragment components are allowed and have no special meanings beyond those defined by the generic syntax."

So, there is apparently no solid basis to say that gemini://example.com and gemini://example.com/ must have the same behavior.

michael-lazar commented 3 years ago

Hi @bortzmeyer. I totally agree with your findings based on the RFC, but the goal of this tool was never limited to only checking what is or isn't allowed by the gemini spec.

usage: gemini-diagnostics [host] [port] [--help]

A diagnostic tool for gemini servers.

This program will barrage your server with a series of requests in
an attempt to uncover unexpected behavior. Not all of these checks
adhere strictly to the gemini specification. Some of them are
general best practices, and some trigger undefined behavior. Results
should be taken with a grain of salt and analyzed on their own merit.

If you have an argument for why this shouldn't be a "best practice" I'm interested to hear about it.

For the record I don't think that gemini servers should necessarily strive to pass all of these tests. I added gemini:// support to pygopherd and it fails like half of these because the implementation was way less complicated that way. But I think it's a good tool to help uncover edge cases that one might run into when writing a server.

bortzmeyer commented 3 years ago

In that case, I would suggest to have several levels of checking like, for instance, many compilers do. Something like:

This specific test would be run only with --level strict.

bortzmeyer commented 3 years ago

Regarding the "empty path is single-slash" issue, there is a proposal in the specification issue tracker: https://gitlab.com/gemini-specification/gemini-text/-/issues/2

michael-lazar commented 3 years ago

The spec has been updated to mandate the same behavior for "/" and ""

The new specification explicitly mentions this case, and mandates that an empty path and a path of "/" are the same.

https://gitlab.com/gemini-specification/protocol/-/issues/11