simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.64k stars 111 forks source link

Expose convert recipes to `sqlite-utils --functions` #484

Open simonw opened 2 years ago

simonw commented 2 years ago

--functions was added in:

It would be useful if the r.jsonsplit() and similar recipes for sqlite-utils convert could be used in these blocks of code too: https://sqlite-utils.datasette.io/en/stable/cli.html#sqlite-utils-convert-recipes

simonw commented 2 years ago

Here's the implementation for recipes at the moment: https://github.com/simonw/sqlite-utils/blob/5b969273f1244b1bcf3e4dc071cdf17dab35d5f8/sqlite_utils/utils.py#L434-L441

And here's the --functions implementation that doesn't expose them: https://github.com/simonw/sqlite-utils/blob/5b969273f1244b1bcf3e4dc071cdf17dab35d5f8/sqlite_utils/cli.py#L3029-L3040

simonw commented 2 years ago

Will also need to update documentation here: https://sqlite-utils.datasette.io/en/stable/cli.html#defining-custom-sql-functions

simonw commented 2 years ago

This feature is a tiny bit weird though: the recipe functions are not exposed to SQL by default, they are instead designed to be used with sqlite-utils convert. I guess with --functions support you could do something like this:

sqlite-utils data.db "update mytable set col1 = parsedate(col1)" --functions "parsedate = r.parsedate"
simonw commented 2 years ago

It's not quite that simple. I tried applying this patch:

diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py
index c51b101..33e4d90 100644
--- a/sqlite_utils/cli.py
+++ b/sqlite_utils/cli.py
@@ -30,6 +30,7 @@ from .utils import (
     Format,
     TypeTracker,
 )
+from . import recipes

 CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"])
@@ -3029,7 +3030,7 @@ def _load_extensions(db, load_extension):
 def _register_functions(db, functions):
     # Register any Python functions as SQL functions:
     sqlite3.enable_callback_tracebacks(True)
-    globals = {}
+    globals = {"r": recipes, "recipes": recipes}
     try:
         exec(functions, globals)
     except SyntaxError as ex:

Then got this:

% sqlite-utils memory --functions 'parsedate = r.parsedate' 'select parsedate("1st jan")'
Error: wrong number of arguments to function parsedate()
% sqlite-utils memory --functions 'parsedate = r.parsedate' 'select parsedate("1st jan", 0, 0, 0)' 
[{"parsedate(\"1st jan\", 0, 0, 0)": "2022-01-01"}]

The problem here is that the parsedate function signature looks like this:

https://github.com/simonw/sqlite-utils/blob/d9b9e075f07a20f1137cd2e34ed5d3f1a3db4ad8/sqlite_utils/recipes.py#L8

But the code that register SQL functions introspects that signature, so creates a SQL function that requires four arguments.

simonw commented 2 years ago

So you would need to do this instead:

sqlite-utils memory 'select parsedate("1st jan")' --functions '
def parsedate(s):
    return r.parsedate(s)
'
simonw commented 2 years ago

I could teach this code here to only register the function using arguments that don't have default parameters:

https://github.com/simonw/sqlite-utils/blob/d9b9e075f07a20f1137cd2e34ed5d3f1a3db4ad8/sqlite_utils/cli.py#L3037-L3040

Or even this code here:

https://github.com/simonw/sqlite-utils/blob/d9b9e075f07a20f1137cd2e34ed5d3f1a3db4ad8/sqlite_utils/db.py#L398-L418

simonw commented 2 years ago

That would be a breaking change though - existing code that registers functions with default parameters should continue to work unchanged (unless I want to ship sqlite-utils 4.0).

simonw commented 2 years ago

I could do this:

 # Register all callables in the locals dict: 
 for name, value in globals.items(): 
     if callable(value) and not name.startswith("_"): 
         db.register_function(value, name=name, ignore_params_with_defaults=True) 

Introducing a new ignore_params_with_defaults option.

simonw commented 2 years ago

Here's how to detect defaults in the function signature:

>>> import inspect
>>> def foo(a, b, c=1, d=2):
...      pass
... 
>>> inspect.signature(foo)
<Signature (a, b, c=1, d=2)>
>>> inspect.signature(foo).parameters
mappingproxy(OrderedDict([('a', <Parameter "a">), ('b', <Parameter "b">), ('c', <Parameter "c=1">), ('d', <Parameter "d=2">)]))
>>> inspect.signature(foo).parameters['c']
<Parameter "c=1">
>>> dir(inspect.signature(foo).parameters['c'])
['KEYWORD_ONLY', 'POSITIONAL_ONLY', 'POSITIONAL_OR_KEYWORD', 'VAR_KEYWORD', 'VAR_POSITIONAL', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '_annotation', '_default', '_kind', '_name', 'annotation', 'default', 'empty', 'kind', 'name', 'replace']
>>> inspect.signature(foo).parameters['c'].default
1
>>> inspect.signature(foo).parameters['a'].default
<class 'inspect._empty'>
simonw commented 2 years ago

OK with this:

diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py
index c51b101..93d82a9 100644
--- a/sqlite_utils/cli.py
+++ b/sqlite_utils/cli.py
@@ -30,6 +30,7 @@ from .utils import (
     Format,
     TypeTracker,
 )
+from . import recipes

 CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"])
@@ -3029,7 +3030,7 @@ def _load_extensions(db, load_extension):
 def _register_functions(db, functions):
     # Register any Python functions as SQL functions:
     sqlite3.enable_callback_tracebacks(True)
-    globals = {}
+    globals = {"r": recipes, "recipes": recipes}
     try:
         exec(functions, globals)
     except SyntaxError as ex:
@@ -3037,4 +3038,4 @@ def _register_functions(db, functions):
     # Register all callables in the locals dict:
     for name, value in globals.items():
         if callable(value) and not name.startswith("_"):
-            db.register_function(value, name=name)
+            db.register_function(value, name=name, ignore_defaults=True)
diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index 27c46b0..1407d23 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -370,6 +370,7 @@ class Database:
         self,
         fn: Callable = None,
         deterministic: bool = False,
+        ignore_defaults: bool = False,
         replace: bool = False,
         name: Optional[str] = None,
     ):
@@ -397,7 +398,10 @@ class Database:

         def register(fn):
             fn_name = name or fn.__name__
-            arity = len(inspect.signature(fn).parameters)
+            params = inspect.signature(fn).parameters
+            if ignore_defaults:
+                params = [p for p in params if params[p].default is inspect._empty]
+            arity = len(params)
             if not replace and (fn_name, arity) in self._registered_functions:
                 return fn
             kwargs = {}

I can now do this:

% sqlite-utils memory --functions 'parsedate = r.parsedate' 'select parsedate("1st jan")'
[{"parsedate(\"1st jan\")": "2022-01-01"}]
simonw commented 2 years ago

Or... I could automatically register multiple copies of the function of different arities!

If I'm going to do something like that though I need to think carefully about how functions that have keyword-only arguments should work: https://peps.python.org/pep-3102/

def compare(a, b, *ignore, key=None):
    ...

I should think about how these work with db.register_function() anyway, since SQL functions cannot support keyword arguments.