simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.67k stars 111 forks source link

Repeated calls to `Table.convert()` fail #525

Closed mcarpenter closed 1 year ago

mcarpenter commented 1 year ago

Summary

When using the API, repeated calls to Table.convert() do not work correctly since all conversions quietly use the callable (function, lambda) from the first call to convert() only. Subsequent invocations with different callables use the callable from the first invocation only.

Example

from sqlite_utils import Database

db = Database(memory=True)
table = db['table']
col = 'x'
table.insert_all([{col: 1}])
print(table.get(1))

table.convert(col, lambda x: x*2)
print(table.get(1))

def zeroize(x):
    return 0
#zeroize = lambda x: 0
#zeroize.__name__ = 'zeroize'
table.convert(col, zeroize)
print(table.get(1))

Output:

{'x': 1}
{'x': 2}
{'x': 4}

Expected:

{'x': 1}
{'x': 2}
{'x': 0}

Explanation

This is some relevant documentation.

There's a mismatch between the comments and the code: https://github.com/simonw/sqlite-utils/blob/fc221f9b62ed8624b1d2098e564f525c84497969/sqlite_utils/db.py#L404

but actually the existing function is returned/used instead (as the "registering custom sql functions" doc I linked above says too). Seems like this can be rectified to match the comment?

Suggested fix

I think there are four things:

  1. The call to register_function() from convert()should have an explicit name= parameter (to continue using convert_value() and the progress bar).
  2. For functions, this name can be the real function name. (I understand the sqlite api needs a name, and it's nice if those are recognizable names where possible). For lambdas would 'lambda-{uuid}' or similar be acceptable?
  3. register_function() really should throw an error on repeated attempts to register a duplicate (function, arity)-pair.
  4. A test? I haven't looked at the test framework here but seems this should be testable.

See also

mcarpenter commented 1 year ago

PR below

mcarpenter commented 1 year ago

Meanwhile, a cheap workaround is to invalidate the registered function cache:

table.convert(...)
db._registered_functions = set()
table.convert(...)
simonw commented 1 year ago

I like the lambda-{uuid} idea.

simonw commented 1 year ago

That original example passes against main now.