matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.78k stars 2.13k forks source link

Systematic validation of Synapse's config #12651

Open DMRobertson opened 2 years ago

DMRobertson commented 2 years ago

I'd love to see something does all of our config key checking and typechecking at startup time. Ideally this would give us useful type annotations which could flow through the rest of the program; but I think we should optimise for something which produces good error messages for the server owner.

DMRobertson commented 2 years ago

Thoughts: jsonschema, pydantic. Other options?

dklimpel commented 2 years ago

Related to

clokep commented 2 years ago

Thoughts: jsonschema, pydantic. Other options?

We tend to use jsonschema for the config already.

See also, the validate_config function:

https://github.com/matrix-org/synapse/blob/4b965c862dc66c0da5d3240add70e9b5f0aa720b/synapse/config/_util.py#L22-L40

dklimpel commented 2 years ago

I am not sure if this also related:

DMRobertson commented 2 years ago

Thoughts: jsonschema, pydantic. Other options?

We tend to use jsonschema for the config already.

Maybe we just need more schemas for the config subsections then? (I see 42 matches for def read_config\( but only 11 for validate_config\(.)

With that said: for fuller annotations, we'd have to manually maintain a type in the schema and a type in the config class itself. (That's the only reason I mention pydantic.)

richvdh commented 2 years ago

Maybe we just need more schemas for the config subsections then?

Yes, my vision here was that we would add more jsonschema validation over time, and that we should make sure new config sections at least have validation. Ultimately we might be able to join up the bits of jsonschema into a single big schema.

With that said: for fuller annotations, we'd have to manually maintain a type in the schema and a type in the config class itself.

I had a bit of a go at automatic validation when I wrote the parser for the oidc config settings (which parses the settings into a typed OidcProviderConfig object, so suffers exactly the duplication problem you describe). I decided that an explicit schema would be better, but I don't think I actually tried pydantic.

One of the problems is that there is often a bit of pre-parsing between reading the YAML and storing it in the Config class. That's not necessarily a barrier to doing things differently, though.

I wouldn't be averse to switching to pydantic if it

If you're interested in pursuing pydantic, it might be worth trying an alternative implementation for OIDCConfig, so we can compare and contrast?

DMRobertson commented 2 years ago

If you're interested in pursuing pydantic, it might be worth trying an alternative implementation for OIDCConfig, so we can compare and contrast?

I'm giving this a go on https://github.com/matrix-org/synapse/tree/dmr/oidc-config-pydantic in spare moments. There's some investigation to be done though. https://github.com/samuelcolvin/pydantic/discussions/4087 for starters!

symphorien commented 2 years ago

It would be nice if it was possible to validate the syntax of config file without running synapse (and possibly without having access to the network, database, certificates etc). It allows to validate config before restarting synapse and possibly having it not restart because of a typo. It would also allow config validation at "build time" before deployment for example on NixOS.

Random examples of prior work: postgres -C sshd -t varnishd -C nagios -v collectd -t

nginx has a similar possibility but it insists in reading all certificates so you can't do the syntax check on another machine pre-deployment.

DMRobertson commented 2 years ago

If you're interested in pursuing pydantic, it might be worth trying an alternative implementation for OIDCConfig, so we can compare and contrast?

I'm giving this a go on https://github.com/matrix-org/synapse/tree/dmr/oidc-config-pydantic in spare moments. There's some investigation to be done though. samuelcolvin/pydantic#4087 for starters!

https://github.com/matrix-org/synapse/blob/44ac98422ce242d2ca7b383a33274fb4f47b980b/synapse/config/oidc2.py is a first pass at this. There are plenty of TODOs and it's a bit more verbose than I'd've liked. Maybe it's enough to see if people like the syntax.

On error messages: I've been checking that the model behaves as I expect it to here: https://github.com/matrix-org/synapse/blob/44ac98422ce242d2ca7b383a33274fb4f47b980b/tests/config/test_oidc2.py . I'd like to tweak that test case so that it can be invoked in a way which will print out all ValidationErrors, so we can get a sense of how clear the error messages are.

richvdh commented 2 years ago

It would be nice if it was possible to validate the syntax of config file without running synapse

@symphorien: it's not well documented, but you can:

python -m synapse.config -c homeserver.yaml

(or equivalent under poetry), which does exactly that.

Nevertheless we should replace it with a "check mode" for synapse itself (and the worker apps)

symphorien commented 2 years ago

It still requires to run a full synapse, it's not only a syntax check:

venv $  python3 -m synapse.config -c ~/src/nixpkgs/homeserver.yaml 
You have set two incompatible options, expiry_time and expire_caches. Please only use the expire_caches and cache_entry_ttl options and delete the expiry_time option as it is deprecated.
Expiry_time is a deprecated option, please use the expire_caches and cache_entry_ttl options instead.
Traceback (most recent call last):
  File "/nix/store/wnrc4daqbd6v5ifqlxsj75ky8556zy0p-python3-3.9.12/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/nix/store/wnrc4daqbd6v5ifqlxsj75ky8556zy0p-python3-3.9.12/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/tmp/venv/lib/python3.9/site-packages/synapse/config/__main__.py", line 56, in <module>
    main(sys.argv)
  File "/tmp/venv/lib/python3.9/site-packages/synapse/config/__main__.py", line 30, in main
    config = HomeServerConfig.load_config("", load_config_args)
  File "/tmp/venv/lib/python3.9/site-packages/synapse/config/_base.py", line 461, in load_config
    obj, _ = cls.load_config_with_parser(config_parser, argv)
  File "/tmp/venv/lib/python3.9/site-packages/synapse/config/_base.py", line 532, in load_config_with_parser
    obj.parse_config_dict(
  File "/tmp/venv/lib/python3.9/site-packages/synapse/config/_base.py", line 718, in parse_config_dict
    self.invoke_all(
  File "/tmp/venv/lib/python3.9/site-packages/synapse/config/_base.py", line 347, in invoke_all
    res[config_class.section] = getattr(config, func_name)(*args, **kwargs)
  File "/tmp/venv/lib/python3.9/site-packages/synapse/config/repository.py", line 120, in read_config
    self.media_store_path = self.ensure_directory(
  File "/tmp/venv/lib/python3.9/site-packages/synapse/config/_base.py", line 199, in ensure_directory
    os.makedirs(dir_path, exist_ok=True)
  File "/nix/store/wnrc4daqbd6v5ifqlxsj75ky8556zy0p-python3-3.9.12/lib/python3.9/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/nix/store/wnrc4daqbd6v5ifqlxsj75ky8556zy0p-python3-3.9.12/lib/python3.9/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/var/lib/matrix-synapse'
richvdh commented 2 years ago

PermissionError: [Errno 13] Permission denied: '/var/lib/matrix-synapse'

well, it shouldn't be doing that. Anyway, that's getting offtopic here. Feel free to open a separate issue.