Open simonw opened 6 years ago
This came up in #588 - it would be helpful if this would spot things like "queries"
defined against the tables block when they should be defined against a database.
Is there already functionality that can be used to validate the metadata.json
file? Is there a JSON Schema that defines it? Or a validation that's available via datasette with Python? We're working on automatically building the metadata in CI and when we deploy to cloud run, and it would be nice to be able to check whether the the metadata we're outputting is valid in our tests.
Interesting example of why this would be valuable here:
This YAML file:
title: Some title
description_html: |-
<p>This is an experiment.</p>
databases:
off:
tables:
products_from_owners:
title: products_from_owners*
Was loaded as equivalent to this JSON:
{
"title": "Some title",
"description_html": "<p>This is an experiment.</p>",
"databases": {
"false": {
"tables": {
"products_from_owners": {
"title": "products_from_owners*"
}
}
}
}
}
Validation that caught this would have been useful.
I'm inclined to consider Pydantic for this, since it is widely used now and can generate really good error messages.
@zschira is working with Pydantic while converting between and validating JSON frictionless datapackage descriptors that annotate an SQLite DB (extracted from FERC's XBRL data) and the Datasette YAML metadata so we can publish them with Datasette. Maybe there's some overlap? We've been loving Pydantic.
Did some related research work in this issue:
Another example of confusion from this today: https://discord.com/channels/823971286308356157/823971286941302908/1121042411238457374
It's easy to misspell the name of a database or table and then be puzzled when the metadata settings silently fail.
To avoid this, let's sanity check the provided metadata.json on startup and quit with a useful error message if we find any obvious mistakes.