Open eyeseast opened 2 years ago
If we do this I'm keen to have it be more than just an alternative to the existing sqlite-utils
command - especially since if I add sqlite-utils
as a dependency of Datasette in the future that command will be installed as part of pip install datasette
anyway.
My best thought on how to differentiate them so far is plugins: if Datasette plugins that provide alternative outputs - like .geojson
and .yml
and suchlike - also work for the datasette query
command that would make a lot of sense to me.
One way that could work: a --fmt geojson
option to this command which uses the plugin that was registered for the specified extension.
My best thought on how to differentiate them so far is plugins: if Datasette plugins that provide alternative outputs - like .geojson and .yml and suchlike - also work for the datasette query command that would make a lot of sense to me.
That's my thinking, too. It's really the thing I've been wanting since writing datasette-geojson
, since I'm always exporting with datasette --get
. The workflow I'm always looking for is something like this:
cd alltheplaces-datasette
datasette query dunkin_in_suffolk -f geojson -o dunkin_in_suffolk.geojson
I think this probably needs either a new plugin hook separate from register_output_renderer
or a way to use that without going through the HTTP stack. Or maybe a render mode that writes to a stream instead of a response. Maybe there's a new key in the dictionary that register_output_renderer
returns that handles CLI exports.
A render mode for that plugin hook that writes to a stream is exactly what I have in mind:
So maybe render_output_render
returns something like this:
@hookimpl
def register_output_renderer(datasette):
return {
"extension": "geojson",
"render": render_geojson,
"stream": stream_geojson,
"can_render": can_render_geojson,
}
And stream gets an iterator, instead of a list of rows, so it can efficiently handle large queries. Maybe it also gets passed a destination stream, or it returns an iterator. I'm not sure what makes more sense. Either way, that might cover both CLI exports and streaming responses.
And just to think this through a little more, here's what stream_geojson
might look like:
async def stream_geojson(datasette, columns, rows, database, stream):
db = datasette.get_database(database)
for row in rows:
feature = await row_to_geojson(row, db)
stream.write(feature + "\n") # just assuming newline mode for now
Alternately, that could be an async generator, like this:
async def stream_geojson(datasette, columns, rows, database):
db = datasette.get_database(database)
for row in rows:
feature = await row_to_geojson(row, db)
yield feature
Not sure which makes more sense, but I think this pattern would open up a lot of possibility. If you had your stream_indented_json function, you could do yield from stream_indented_json(rows, 2)
and be one your way.
Was looking through old issues and realized a bunch of this got discussed in #1101 (including by me!), so sorry to rehash all this. Happy to help with whatever piece of it I can. Would be very excited to be able to use format plugins with exports.
I started sketching out a plugin to add a
datasette query
subcommand to export data from the command line. This is based on discussions in #1356 and #1605. Before I get too far down this rabbit hole, I figure it's worth getting some feedback here (unless this should happen inDiscussions
). Here's what I'm thinking:At its most basic, it will write the results of a query to STDOUT.
This isn't much improvement over using sqlite-utils. To make better use of datasette and its ecosystem, run
datasette query
using a canned query defined in ametadata.yml
file.For example, using the metadata file from alltheplaces-datasette:
That query would be good to get as CSV, and we can auto-discover metadata and databases in the current directory:
In this case,
count_by_spider
is a canned query defined on thealltheplaces
database. If the same query is defined on multiple databases or its otherwise unclear which databasequery
should use, pass the-d
or--database
option.If a query takes parameters, I can pass them in at runtime, using the
--param
or-p
option:I'm very interested in feedback on this, including whether it should be a plugin or in Datasette core. (I don't have a strong opinion about this, but I'm prototyping it as a plugin to start.)