simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.64k stars 111 forks source link

Should upsert() work with compound primary key? #629

Open jclark-dot-org opened 3 months ago

jclark-dot-org commented 3 months ago

Given this table:

create table main.summary (
    Source      TEXT,
    Object      TEXT,
    Category    TEXT,
    Count       INTEGER,
    primary key (Source, Object, Category)
)

I'd like to have a simple helper to run rollups like so:

def summarize(db, table, source, object_name, category='All', criteria=None):
    db['summary'].upsert({
        'source': source,
        'object': object_name,
        'category': category,
        'count': table.count_where(criteria)
    })

To use like so:

utils.summarize(db, db["accounts_a"], 'Client A', 'Accounts')
utils.summarize(db, db["accounts_b"], 'Client B', 'Accounts', 'Customer', 'Type = "Customer"')
utils.summarize(db, db["accounts_b"], 'Client B', 'Accounts', 'Prospect', 'Type = "Prospect"')

But I'm getting an error that upsert requires a pk:

  File "/Users/jclark/Documents/dev/clientname/merge/scripts/utils.py", line 13, in summarize
    db['summary'].upsert({
  File "/opt/homebrew/lib/python3.12/site-packages/sqlite_utils/db.py", line 3346, in upsert
    return self.upsert_all(
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/sqlite_utils/db.py", line 3383, in upsert_all
    return self.insert_all(
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/sqlite_utils/db.py", line 3237, in insert_all
    raise PrimaryKeyRequired("upsert() requires a pk")
sqlite_utils.db.PrimaryKeyRequired: upsert() requires a pk

Per the upsert docs, the pk param to upsert() is only used when creating the target table, which I am not, so I'm assuming the error means it doesn't like my pk. Is that because it's a compound key? Or am I doing something else wrong?

jclark-dot-org commented 3 months ago

This appears to be a documentation issue.,

Took a look in the code; looks like pk is required for upsert, despite the docs saying,

Note that the pk and column_order parameters here are optional if you are certain that the table has already been created. You should pass them if the table may not exist at the time the first upsert is performed.

So I modified my method to pass the pk as a tuple and it's working fine.

def summarize(db, table, source, object_name, category='All', criteria=None):
    db['summary'].upsert({
        'source': source,
        'object': object_name,
        'category': category,
        'count': table.count_where(criteria)
    }, pk=("source", "object", "category"))