Closed simonw closed 1 year ago
Example usage:
dclient alias add simon https://simon.datasette.cloud/data
dclient auth add simon
# <paste in token here>
dclient insert simon my_table my_table_data.csv --pk id --create
This would create a table at /data/my_table
with the contents of that CSV file.
The --create
option would be ignored if the table already existed.
I'm still not sold on the best way of specifying the table to be inserted into with respect to the URL for the database.
I've previously been saying that an alias should always match to a database - so https://simon.datasette.cloud/data
for example.
But then when I implemented authentication I instead decided that a token could be specified for any URL, be it instance-level or database-level or table-level or even row-level.
Should that decision reflect on aliases too?
Most Datasette instances will only ever have a single database. It would be nice to be able to run an insert into table foo
against https://simon.datasette.cloud/
and have it automatically use the /data
database since that's the only one on the instance.
Or what if you could do this:
dclient alias add simon https://simon.datasette.cloud/
dclient insert simon data2/my_table my_table_data.csv --pk id --create
Here you're allowed to optionally specify a database with dbname/
before the table.
But... remember, some tables can have /
in their names, see https://latest.datasette.io/fixtures/table~2Fwith~2Fslashes~2Ecsv
This should absolutely be able to read data from a local SQLite database and push it up to the API as well.
That might be a feature for another command - maybe a sync
command - but I'll consider it for inclusion in the insert
command too, at least at first.
One option for the alias thing could be that you can run dclient alias add simon https://simon.datasette.cloud/
but it will return an error if that instance has more than one public database. If it DOES only have one public database the add
command will detect that and will look up and persist that database name.
For the purpose of this issue I'll assume that aliases do indeed refer to a specific database URL already.
I'm going to get this working:
dclient insert https://simon.datasette.cloud/data my_table data.csv
You can use an alias instead of the full URL to the database.
You can pass -
to read from standard input.
By default it will detect the type of content.
Options:
--create
- create the table if it doesn't already exist--replace
- replace any rows with a matching primary key--ignore
- ignore any rows with a matching existing primary key--pk id
- set a primary key (for if the table is being created)--nl
- format is newline-delimited JSON--csv
- format is CSV--tsv
- format is TSV--json
- format is JSON [{...}, {...}]
--encoding
- string encoding to use--no-detect-types
- unlike sqlite-utils insert
the default will be to detect types in CSV files. This reverts to everything being a string instead.These are mostly meant to be consistent with sqlite-utils insert
- https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-json-data
I'm considering if I should add a --upsert
option or if there should be a separate dclient upsert
command. Having a separate command would be more consistent with https://sqlite-utils.datasette.io/en/stable/cli.html#upserting-data
I'll do the rest of the work for this in a PR - I just pushed the initial documentation (implementing docs-first).
Tried this against -
and got this error:
TypeError: rows_from_file() requires a file-like object that supports peek(), such as io.BytesIO
So I think if you pass in data from stdin you need to select --json
or --csv
or whatever.
Idea: resume mode, for resuming an upload if it gets cut off.
Not sure how best this could work. Maybe it should track the byte offset of the file it has read (useful for progress bars too) and record that somewhere, maybe in filename.csv.progress
- but only if you pass the --resume
or --continue
option. On subsequent uses of that it could continue from that offset.
Not sure if it should protect you against attempting a continue when the file itself has been modified. It could store a hash of the file content, but that would prevent it from resuming against a file that has had data appended to it.
This really does need a progress bar. I can duplicate the way those work in healthkit-to-sqlite
- which ran them against open files by counting the bytes processed using fp.read()
: https://github.com/dogsheep/healthkit-to-sqlite/blob/9fe3cb17e03d6c73222b63e643638cf951567c4c/healthkit_to_sqlite/utils.py
Obviously no progress bar supported for content from stdin.
Still need to add these options:
@click.option(
"pks",
"--pk",
multiple=True,
help="Columns to use as the primary key when creating the table",
)
@click.option(
"--replace", is_flag=True, help="Replace rows with a matching primary key"
)
@click.option("--ignore", is_flag=True, help="Ignore rows with a matching primary key")
Demo as a GIF:
This should make use of these Datasette 1.0 APIs:
It should have a CLI design that's as close as possible to
sqlite-utils insert
: https://sqlite-utils.datasette.io/en/stable/cli-reference.html#insertAdd the
--create
option to cause it to attempt to create the table.It should accept JSON, nl-JSON, CSV and TSV - like
sqlite-utils insert
does.I can use this function to help implement that: https://sqlite-utils.datasette.io/en/stable/reference.html#sqlite-utils-utils-rows-from-file