simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.68k stars 112 forks source link

Add an option for specifying column names when inserting CSV data #474

Open hubgit opened 2 years ago

hubgit commented 2 years ago

https://sqlite-utils.datasette.io/en/stable/cli.html#csv-files-without-a-header-row

The first row of any CSV or TSV file is expected to contain the names of the columns in that file.

If your file does not include this row, you can use the --no-headers option to specify that the tool should not use that fist row as headers.

If you do this, the table will be created with column names called untitled_1 and untitled_2 and so on. You can then rename them using the sqlite-utils transform ... --rename command.

It would be nice to be able to specify the column names when importing CSV/TSV without a header row, via an extra command line option.

(renaming a column of a large table can take a long time, which makes it an inconvenient workaround)

simonw commented 2 years ago

A couple of tricks I use here. Firstly, I often create the table before the import using the sqlite-utils create-table command: https://sqlite-utils.datasette.io/en/stable/cli.html#creating-tables

The other current option is to use the bulk command, which lets you construct a custom SQL query to execute against every row from a CSV file: https://sqlite-utils.datasette.io/en/stable/cli.html#executing-sql-in-bulk

Do either of those options work here or is there a useful new feature that would work better?

hubgit commented 2 years ago

Creating the table before inserting is a useful workaround, thanks. It does require figuring out the create table syntax and listing all the fields manually, though, which loses some of the magic of sqlite-utils.

I was expecting to find an option like --headers=foo,bar (or --header-row='foo\tbar', if that would be easier) - not necessarily that exact syntax, but something that would essentially be treated the same as having a header row in the file.

simonw commented 2 years ago

Yeah, having a way of inserting a CSV file but saying "only I want the title column to come first" does make sense to me as a feature.

It can use -o for consistency with https://sqlite-utils.datasette.io/en/stable/cli-reference.html#search and https://sqlite-utils.datasette.io/en/stable/cli-reference.html#rows and https://sqlite-utils.datasette.io/en/stable/cli-reference.html#transform