simonw / datasette

An open source multi-tool for exploring and publishing data
https://datasette.io
Apache License 2.0
9.6k stars 690 forks source link

Proposal: Make the `_internal` database persistent, customizable, and hidden #2157

Open asg017 opened 1 year ago

asg017 commented 1 year ago

The current _internal database is used by Datasette core to cache info about databases/tables/columns/foreign keys of databases in a Datasette instance. It's a temporary database created at startup, that can only be seen by the root user. See an example _internal DB here, after logging in as root.

The current _internal database has a few rough edges:

Additionally, it would be really nice if plugins could use this _internal database to store their own configuration, secrets, and settings. For example:

In general, these are specific features that Datasette plugins would have access to if there was a central internal database they could read/write to:

Proposal

New features unlocked with this

These features don't really need a standardized _internal table per-say (plugins could currently configure their own long-time storage features if they really wanted to), but it would make it much simpler to create these kinds of features with a persistent application database.

simonw commented 1 year ago

We discussed this in-person this morning and these notes reflect what we talked about perfectly.

I've had so many bugs with plugins that I've written myself that have forgotten to special-case the _internal database when looping through datasette.databases.keys() - removing it from there entirely would help a lot.

Just one tiny disagreement: for datasette-comments I think having it store things in _internal could be an option, but in most cases I expect users to chose NOT to do that - because being able to join against those tables for more advanced queries is going to be super useful.

Show me all rows in foia_requests with at least one associated comment in datasette_comments.comments kind of tihng.

simonw commented 1 year ago

But yes, I'm a big +1 on this whole plan.

asg017 commented 1 year ago

@simonw what do you think about adding a DATASETTE_INTERNAL_DB_PATH env variable, where when defined, is the default location of the internal DB? This means when the --internal flag is NOT provided, Datasette would check to see if DATASETTE_INTERNAL_DB_PATH exists, and if so, uses that as the internal database (and would fallback to an ephemeral memory database)

My rationale: some plugins may require, or strongly encourage, a persistent internal database (datasette-comments, datasette-bookmarks, datasette-link-shortener, etc.). However, for users that have a global installation of Datasette (say from brew install or a global pip install), it would be annoying having to specify --internal every time. So instead, they can just add export DATASETTE_INTERNAL_DB_PATH="/path/to/internal.db" to their bashrc/zshrc/whereever to not have to worry about --internal