In PR #37, the custom prompt YAML config files, ai_revision-config.yaml and ai_revision-prompts.yaml are currently ad-hoc validated by the code to look for specific keys and values. As suggested, it would be more robust to have the entire documents' schema described and validated at runtime.
I should mention that it's been a long time since I've done schema validation, but a quick look revealed a few options for schema validation in Python. All of them seem to use Python dictionaries as the common language between different serialization formats (JSON, YAML, etc.) which IMHO is a good idea. Here's the list, ordered by stars on GitHub:
Frankly, from a quick look they all seem very similar. My impression from their docs pages is that Cerberus would be the easiest to work with, so perhaps if there aren't other strong opinions we can just go with that, but I'm of course open to discussion.
_(Suggested by @miltondp in https://github.com/manubot/manubot-ai-editor/pull/37#discussion_r1476744505)_
In PR #37, the custom prompt YAML config files,
ai_revision-config.yaml
andai_revision-prompts.yaml
are currently ad-hoc validated by the code to look for specific keys and values. As suggested, it would be more robust to have the entire documents' schema described and validated at runtime.I should mention that it's been a long time since I've done schema validation, but a quick look revealed a few options for schema validation in Python. All of them seem to use Python dictionaries as the common language between different serialization formats (JSON, YAML, etc.) which IMHO is a good idea. Here's the list, ordered by stars on GitHub:
Frankly, from a quick look they all seem very similar. My impression from their docs pages is that Cerberus would be the easiest to work with, so perhaps if there aren't other strong opinions we can just go with that, but I'm of course open to discussion.