mbrubeck / agate

Very simple server for the Gemini hypertext protocol
Apache License 2.0
579 stars 37 forks source link

Ambiguous .meta handling #271

Open michaelnordmeyer opened 1 year ago

michaelnordmeyer commented 1 year ago

Agate uses .meta files to configure behavior for content files. It also has the ability to have a central version of this file by using --central-conf or -C as a start parameter.

This is ambiguous, because while a central conf works for a single virtual host, it doesn't work as well for a properly configured single virtual host.

If only one hostname is used, then the server answers for any hostname, which is resolved by DNS to this server's IP. This is bad. You only want to answer for example.com, if you use --hostname example.com, and not foo.example.com. Those subdomains might exists for other protocols, e.g. for rewriting www.example.com to example.com for HTTPS or vice-versa.

This could somewhat be mitigated by using two hostnames example.com and example.net when starting Agate, where the latter domain doesn't have any DNS entries. Then the content has to live in /content/example.com, if --content /content is used, which is fine.

But then the central conf has to live at /content/.meta having all entries to be prefixed with example.com/ to honor the subdirectory. This can be quite a lot, if you want to host many virtual hosts.

It would be much better, if the central conf for multiple vhosts would be located in the respective /content/<vhost> directories. This is currently not supported.

Unfortunately, this works half-way, because there is a side-effect when having a single .meta in /content/example.com without having --central-conf configured.

Given a file /content/example.com/subdir/foo.txt, if gemini://example.com/subdir/foo.txt is requested, it will be served correctly. But if the semi-central .meta at /content/example.com/.meta has subdir/foo: ;charset=iso-8859-1 configured, it won't be used by accessing gemini://example.com/subdir/foo.txt after restarting Agate, because the .meta is expected at /content/example.com/subdir/.meta, which is correct.

But if gemini://example.com/ is requested first, the .meta at /content/example.com/.meta will be loaded, which is also correct, but also applied for subsequent requests to gemini://example.com/subdir/foo.txt, which then would reply with 20 text/plain;charset=iso-8859-1.

Regardless of the outcome of this issue, I consider this side-effect to be a bug.

Now, a single conf one for the whole server can get quite large for many vhosts, for example with many redirects after capsule reorganizations. And I also don't want to have one .meta in each directory, because after a capsule reorganization the appropriate directories to put the .meta in might not exist anymore, and I don't want to keep those directories around just to have a .meta in it. A central conf for each vhost would solve this dilemma.

How is the feeling about this?

michaelnordmeyer commented 1 year ago

It is also not great, that a .meta file in /content/example.com/subdir can configure files in other directories. It should only be applicable to content in the directory it is located at.

And because it is called "Meta-Presets", according to the docs, it shouldn't have any impact on the status of a reply. So another file like .status might be appropriate. But then two files have to be read and managed.

If I sound overly negative, I'm happy to report that I like Agate a lot.

Johann150 commented 1 year ago

Sorry, I found this a bit hard to follow, so let me summarize your points to make sure we are on the same page:

  1. If multiple .meta files (in different/parent directories) apply to a file, the order in which they are applied depends on the order paths are accessed in. Instead it should always be the same, no matter in which order paths are accessed in.
  2. There should be a way to have separate .meta files for separate virtual hosts when using a central configuration.
  3. .meta files may configure files that are in other directories without using central config mode. This should not be possible.
  4. Because .meta is documented in the section called "Meta-Presets" it should not affect the returned status code.

Assuming I understood you correctly, I'll try to answer these concerns:

  1. I agree this is unexpected behaviour and thus may be considered a bug. Although I am not sure if it should rather be considered to be explicitly defined as undefined behaviour if multiple .meta files have rules for a single file. I guess the issue really only arises because of (3). The thing I would not like to do or maybe "mistake to not repeat" from the Apache web server is having to check all parent directories for .meta files.
  2. I would certainly consider a patch for this. I guess with the current improper system you could even make that technically work already if you made agate load configuration files in those directories somehow.
  3. The metadata parsing code is unaware of which configuration mode is being used, so it allows paths containing path separators. That would probably be the thing to be fixed. See also answer for (1).
  4. I don't think the naming matters here and is a technicality, though feel free to suggest a rename for the documentation section. Renaming the file is not a backward compatible change so should be avoided. And splitting into two separate files would from my perspective not provide any benefit for the added complexity, and would also probably change the configuration format so also a non-backward compatible change to be avoided.
michaelnordmeyer commented 1 year ago

Citing from the docs to clarify:

You can also enable a central configuration file with the -C flag (or the long version --central-conf). In this case Agate will always look for the .meta configuration file in the content root directory and will ignore .meta files in other directories.

I'm not a Rust expert, btw.

Re: same page:

  1. No, I didn't write about order, and I only used one .meta file at a time, but with different configurations (with -C in /content, and without in /content/example.com), which get applied by a restart. More explanation after the list.
  2. Yes.
  3. No, that's bad. I would like to have the exact opposite, a simple solution. Otherwise it gets complicated (e.g. /content/exmaple.com/subdir/../../example.net, or partial overwriting of presets, hard to handle). AFAICT, Agate matches full paths only (see source).
  4. Yes, but that's not related to this bug, I only mentioned it in passing.

An example of what's not working. My only .meta in /content/localhost:

**/*.gmi: ;lang=nl
.well-known/security.txt: ;lang=en-US

First try, accessing root first before requesting allowed secret:

(I added --hostname example.com to allow for having subdirectories in content.)

$ target/debug/agate --content /content --addr 0.0.0.0:1965 --hostname localhost --hostname example.com
[2023-07-05T10:07:40Z INFO  agate] Started listener on 0.0.0.0:1965
[2023-07-05T10:07:47Z INFO  agate] 127.0.0.1:1965 - "gemini://localhost/" 20 "text/gemini;lang=nl"
[2023-07-05T10:07:50Z INFO  agate] 127.0.0.1:1965 - "gemini://localhost/.well-known/security.txt" 20 "text/plain;lang=en-US"

Second try, directly requesting allowed secret:

$ target/debug/agate --content /content --addr 0.0.0.0:1965 --hostname localhost --hostname example.com
[2023-07-05T10:08:04Z INFO  agate] Started listener on 0.0.0.0:1965
[2023-07-05T10:08:11Z INFO  agate] 127.0.0.1:1965 - "gemini://localhost/.well-known/security.txt" 52 "If I told you, it would not be a secret."
[2023-07-05T10:08:20Z INFO  agate] 127.0.0.1:1965 - "gemini://localhost/.well-known/security.txt" 52 "If I told you, it would not be a secret."
[2023-07-05T10:08:22Z INFO  agate] 127.0.0.1:1965 - "gemini://localhost/.well-known/security.txt" 52 "If I told you, it would not be a secret."
[2023-07-05T10:08:24Z INFO  agate] 127.0.0.1:1965 - "gemini://localhost/" 20 "text/gemini;lang=nl"
[2023-07-05T10:08:26Z INFO  agate] 127.0.0.1:1965 - "gemini://localhost/.well-known/security.txt" 20 "text/plain;lang=en-US"

Re: concerns:

  1. I wasn't writing about multiple files having presets for a single file. This is another potential problem.
  2. It's not that hard, you need to introduce the idea of vhosts to the code, but backwards compatibility suffers, because -C means location is /content/.meta, but -C for vhosts means no /content/.meta, but many /content/<vhost>/.meta and no /content/<vhost>/**/.meta.
  3. It's either one central conf, or many not central ones. But currently only full paths are matched.
  4. ¯(ツ)/¯

BTW, shouldn't this be .meta instead of .lang?

Johann150 commented 1 year ago

It is confusing to talk about different things in one issue. I made a separate issue #272 for (2).

The other issues with not using central config mode arise because of how .meta files are parsed and loaded by Agate. It does not scan the content directory and subdirectories for .meta files on startup, instead the .meta files are only loaded once the respective directory or a file in it are requested. For this to work, paths should not be used in .meta files. The documentation technically says this, but it is not very clear:

[The .meta file] stores some metadata about the adjacent files which Agate will use when serving these files.

The documentation for this needs to be improved: If you are not using central configuration mode, the <path> must only point to files in the same directory as the .meta file itself, otherwise it may cause undefined behaviour.

michaelnordmeyer commented 1 year ago

Yes, I understand. Unfortunately, the docs also have globs like

**/*.de.gmi: ;lang=de
nl/**/*.gmi: ;lang=nl

as an example to set the lang meta, which you really need to do, because the --lang launch parameter will set the language for each and every file, including image files, which is obviously pretty wrong.

I think that #272 could solve many of these problems.

Johann150 commented 1 year ago

The globs work properly in central configuration mode.

I also have to take back what I said previously, the documentation already explicitly mentions this:

https://github.com/mbrubeck/agate/blob/8f346d7c28a7ddb623a9e8821b5ad7966dfeb09a/README.md?plain=1#L98