simonw / datasette

An open source multi-tool for exploring and publishing data
https://datasette.io
Apache License 2.0
9.54k stars 687 forks source link

Metadata should be a nested arbitrary KV store #185

Open earthboundkid opened 6 years ago

earthboundkid commented 6 years ago

I started using the metadata feature and was surprised to find that values are not inherited from the root object down to specific databases and tables. This makes metadata much less useful and requires a lot of pointless duplication.

Ideally, metadata should allow arbitrary key-value pairs, and there should be a way of accessing metadata either in an inherited or non-inherited manner. Something like metadata.page.key vs. metadata.this.key might work as an interface.

simonw commented 6 years ago

Are you talking specifically about accessing metadata from HTML templates? That makes a lot of sense, I'll think about how this could work.

earthboundkid commented 6 years ago

Yes. I think the simplest implementation is to change lines like

        metadata = self.ds.metadata.get('databases', {}).get(name, {})

to

metadata = {
    **self.ds.metadata,
    **self.ds.metadata.get('databases', {}).get(name, {}),
}

so that specified inner values overwrite outer values, but only if they exist.

simonw commented 6 years ago

OK, I have an implementation of this. I realised that not ALL metadata should be inherited: it makes sense for source/source_url/license/license_url to be inherited, but it doesn't make sense for the title and description to be inherited down to the individual databases and tables.

simonw commented 6 years ago

One thing that's missing from this: if you set source/license data at the individual database level they should be inherited by tables within that database.

simonw commented 6 years ago

Also needed: the ability to unset metadata. If the root metadata specifies a license_url it should be possible to set "license_url": null on a child database or table. The current implementation will ignore null (or empty string) values and default to the top level value.

I think the templates themselves should be able to indicate if they want the inherited values or not. That way we could support arbitrary key/values and avoid the application code having special knowledge of license_url etc.

earthboundkid commented 6 years ago

I think the templates themselves should be able to indicate if they want the inherited values or not. That way we could support arbitrary key/values and avoid the application code having special knowledge of license_url etc.

Yes, you could have metadata that works like metadata does currently and inherited_metadata that works with inheritance.

earthboundkid commented 6 years ago

It would be nice to also allow arbitrary keys (maybe under a parent key called params or something to prevent conflicts). For our datasette project, we just have a bunch of dictionaries defined in the base template for things like site URL and column humanized names: https://github.com/baltimore-sun-data/salaries-datasette/blob/master/templates/base.html It would be cleaner if this were in the metadata.json.

simonw commented 6 years ago

I am SO inspired by what you've done with https://salaries.news.baltimoresun.com/ - that's pretty much my ideal use-case for Datasette, and it's by far the most elaborate customization I've seen so far. I'd love to hear other ideas that came up while building that.

earthboundkid commented 6 years ago

@simonw Other than metadata, the biggest item on wishlist for the salaries project was the ability to reorder by column. Of course, that could be done with a custom SQL query, but we didn't want to have to reimplement all the nav/pagination stuff from scratch.

@carolinp, feel free to add your thoughts.

simonw commented 6 years ago

@carlmjohnson in case you aren't following along with #189 I've shipped the first working prototype of sort-by-column - you can try it out here: https://datasette-issue-189-demo-2.now.sh/salaries-7859114-7859114/2017+Maryland+state+salaries?_search=university&_sort_desc=annual_salary

simonw commented 6 years ago

I've been worrying about how this one relates to #260 - I'd like to validate metadata (to help protect against people e.g. misspelling license_url and then being confused when their license isn't displayed properly), but this issue requests the ability to add arbitrary additional keys to the metadata structure.

I think the solution is to introduce a metadata key called extra_metadata_keys which allows you to specifically list the extra keys that you want to enable. Something like this:

{
    "title": "My title",
    "source": "Source",
    "source_url": "https://www.example.com/",
    "release_date": "2018-04-01",
    "extra_metadata_keys": ["release_date"]
}
earthboundkid commented 6 years ago

That seems good to me.