simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.63k stars 111 forks source link

extracts= should support multiple-column extracts #49

Open simonw opened 5 years ago

simonw commented 5 years ago

Lookup tables can be constructed on compound columns, but the extracts= option doesn't currently support that.

Right now extracts can be defined in two ways:

# Extract these columns into tables with the same name:
dogs = db.table("dogs", extracts=["breed", "most_recent_trophy"])

# Same as above but with custom table names:
dogs = db.table("dogs", extracts={"breed": "Breeds", "most_recent_trophy": "Trophies"})

Need some kind of syntax for much more complicated extractions, like when two columns (say "source" and "source_version") are extracted into a single table.

simonw commented 5 years ago

This syntax should be shared with #42 as much as possible.

Maybe something based on a namedtuple would work, since those are already used in the library.

workouts = db.table("workouts", extracts=[Extract(
    columns=["source", "source_version"],
    table="Sources"
)])

Since namedtuples cannot have default values this should probably be a class instead. Actually it looks like there is a trick for defaults here: https://stackoverflow.com/a/18348004

simonw commented 3 years ago

The new .extract() method can handle multiple columns:

https://github.com/simonw/sqlite-utils/blob/2c541fac352632e23e40b0d21e3f233f7a744a57/tests/test_extract.py#L70-L87

simonw commented 3 years ago

Using a tuple would work:

fresh_db.table("tree", extracts=[("common_name", "latin_name")])

Or to define a custom name:

fresh_db.table("tree", extracts={("common_name", "latin_name"): "names"})
simonw commented 3 years ago

I wonder if there's value in extending the extracts= option at all given the existence of table.extract().

simonw commented 3 years ago

I think there is. It's a nice existing feature, and I don't think adding tuple support to it would be a huge lift.

simonw commented 3 years ago

Here's the most complex example of .extracts():

db["Trees"].extract(
    ["CommonName", "LatinName"],
    table="Species",
    fk_column="species_id",
    rename={"CommonName": "name", "LatinName": "latin"}
)

Resulting in:

CREATE TABLE [Species] (
    [id] INTEGER PRIMARY KEY,
    [name] TEXT,
    [latin] TEXT
)

From https://sqlite-utils.readthedocs.io/en/stable/python-api.html#extracting-columns-into-a-separate-table

simonw commented 3 years ago

If I want to support that most complicated example, I think the option to pass a Extracts() object to extracts= is the best way to do it:

fresh_db.table("tree", extracts=[Extract(
    columns=("CommonName", "LatinName"),
    table="Species",
    fk_column="species_id",
    rename={"CommonName": "name", "LatinName": "latin"}
)])
simonw commented 3 years ago

But this begins to feel too complicated, given that table.extract() can already be used to achieve the same thing.

simonw commented 3 years ago

I'm not going to implement this. I'll leave extract=... as it is right now, suitable for quick simple single-column operations on input, but if users want to do something more complicated involving multiple columns they should use the table.extract() method after the initial insert instead.

simonw commented 3 years ago

Reconsidering: #89 was a feature request that relates to this, so maybe this is worth implementing after all.