rogerbinns / apsw

Another Python SQLite wrapper
https://rogerbinns.github.io/apsw/
Other
733 stars 97 forks source link

Parameterized conversion types #440

Closed plasticassius closed 1 year ago

plasticassius commented 1 year ago

I'm interested in using parameters in conversion types in apsw.ext and I thought this might appeal to other users. I included an example of what I've been working with below, but I didn't open a pull request because it's still more of a proof of concept.

I included two parameterized types, both are int types in the database, but one is an enum that has named values, and the other has a fixed number of digits after the decimal point.

#!/usr/bin/python3 -B

import apsw, apsw.ext
import decimal
import re

'''
Here is a patched together idea for using apsw.ext with parameterized types.
It uses a regular expression to divide SQLite types into types for
looking up in the table in TypesConverterCursorFactory, and arguments
to the type. A couple examples of this are included below.

To add flexibility, AS clause names are used as types when other type
information is not avaiable. This is useful, but has the potential to
break code by causing unexpected conversions. This could be mitigated by
avoiding conversions when the regular expression doesn't match.
However, it isn't handled in any way right now.
'''

def convert_value(self, schematype, value):
    '''Returns Python object from schema type and SQLite value
    Applies converter regular expression to schema type to
    extract the type and it's arguments'''
    if value is None or schematype is None: return value
    args = None
    if hasattr(self,'converter_re') and self.converter_re:
        "Use re if it's there"
        match = self.converter_re.match(schematype)
        if match:
            "The type is returned in the first match"
            schematype = match[1]
            if match.lastindex > 1:
                "The args are returned in the second match"
                args = match[2]
    converter = self.converters.get(schematype)
    if not converter: return value
    "Pass both value and the type arguments to the converter"
    "Perhaps the args could be passed as a named argument instead"
    return converter(value, args) if args else converter(value)
apsw.ext.TypesConverterCursorFactory.convert_value = convert_value

def _rowtracer(self, cursor, values):
    '''This allows us to use an AS clause in a select statement to specify a type
    in situations where a type isn't otherwise avaiable.'''
    return tuple(self.factory.convert_value(d[1] if d[1] else d[0], v) for d, v in zip(cursor.getdescription(), values))
apsw.ext.TypesConverterCursorFactory.TypeConverterCursor._rowtracer = _rowtracer

registrar = apsw.ext.TypesConverterCursorFactory()

registrar.converter_re = re.compile(r"^(?:[^_]*_)?([^$\(]+)[$\(]?([^\)]*)\)?$")
'''This regular expression divides the type name into a type (first sub expression)
for the converters and arguments to the type (second sub expression).
An underscore follows an otherwise arbitrary string which can be used as the type for
Sqlite3 (INT, FLOAT, BLOB, ...). The type name is between the underscore and first
dollar sign and the arguments are after the first dollar sign. The arguments can 
alternatively be surrounded by parentheses, but Sqlite3 restricts that syntax to
one or two numbers, and doesn't accept parentheses in AS clauses
'''

registrar.register_converter('DEC', lambda b, d: decimal.Decimal(str(int(b)) + 'e-' + str(d)))
'''This converts fixed percision ints from SQLite to python decimal types. The
argument is the number of digits after the decimal point.
'''

class DbifEnum():
    ''' This class defines an enumerated type which is stored as an int, with one of
the type arguments displayed as output when converted to str.
'''
    def __init__(self, value, strs):
        self.value = int(value)
        self.strs = strs

    def __str__(self):
        l = self.strs.split('$')
        return '\uFFFD' if self.value <= 0 or self.value > len(l) else l[self.value-1]

    def __repr__(self):
        return type(self).__name__ + '(' + str(self.value) + ",'" + self.strs + "')"

registrar.register_converter('ENUM', DbifEnum)

def Dump(cursor, sql):
    print(sql)
    cursor.execute(sql)
    for row in cursor:
        for val in row:
            print(val, end=' ')
        print()

connection = apsw.Connection(':memory:')
connection.cursor_factory = registrar

cursor = connection.cursor()
cursor.execute("CREATE TABLE t(a INT_ENUM$type1$type2, b INT_DEC$2)")
cursor.execute("insert into t values (1,732),(2,56)")

Dump(cursor, 'select * from t')
Dump(cursor, 'select sum(b), sum(b) as sum_DEC$2 from t')

'''output:
select * from t
type1 7.32 
type2 0.56 
select sum(b), sum(b) as sum_DEC$2 from t
788 7.88 
'''
rogerbinns commented 1 year ago

The easy bit is the conversion of values going into SQLite, which you have covered.

For your needs of data coming out of SQLite, I think a better approach is to use the DataClassRowFactory. Inherit from that and override the get_dataclass method. (Make sure to @functools.lru_cache it!)

get_dataclass is called with the name and declared type of every result column. get_dataclass ultimately calls dataclasses.make_dataclass. There your choices will depend on your needs and include post init processing and descriptors.

I'm unlikely to add regex based typing because of the complexity and everyone's needs will be different and project dependent. However I am keen to make that as easy as possible via inheritance so you only have to write the small amount of code reflecting those needs.

plasticassius commented 1 year ago

Thanks for the suggestion. At first glance, TypesConverterCursorFactory seems closer to what I'd like to inherit from than DataClassRowFactory due to the use of dataclasses. But, I'll look into it more.

rogerbinns commented 1 year ago

I take it you find using dataclasses less interesting. I'm expecting them to become the most pervasive structure of fields type in Python :) It is the first time the standard library has a data structure that hits all the sweet spots of being light weight, simple to use, has (optional) typing, comes out perfectly in documentation generators, trivial to convert to/from tuple/dict, simple to extend etc. We'll see.

If you are trying for the simplest shortest code possible, then you are welcome to take the code for TypesConverterCursorFactory and just write your own thing. The code I wrote has more blank lines and doc strings than actual code, and you almost certainly don't need all of it.

That said, especially for projects you expect to end up around for a long time and be bigger, then using the dataclass approach is probably better for data coming out of SQLite. It especially matters because you also want to use the column names as relevant information.

In any event, can this issue be closed? You may also want to post to the forum to see what other people think.

plasticassius commented 1 year ago

Dataclasses sound interesting, but the problem I'm looking at is about converting SQLite types to python types which makes TypesConverterCursorFactory more relevant.

As I've shown, I can make parameterized types by simply overriding convert_value. By breaking the arguments out of the type string and passing them to the python class, I can use Decimal from the standard library and other types that need arguments. I think that the regular expression is handy for breaking apart the type string, and can be easily replaced to customize the parsing.

To me it seems that dataclasses would be an enhancement on top of that, by adding complementary features to the data types. I'll think about how these things can be combined.