pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
https://dataset.readthedocs.org/
MIT License
4.76k stars 297 forks source link

Custom type columns not preserved #311

Closed mjd3 closed 4 years ago

mjd3 commented 4 years ago

It seems like using the insert/update commands, one is able to specify types for the input to the database, but when using find or find_one, it is not possible. Am I just doing this incorrectly? The reasoning behind this question is that I am using sqlalchemy's TypeDecorator to convert several different objects to binary strings, but I can't find a good way to have the binary strings automatically decoded when querying the database later on.

Also, this seems relevant to #254, since supporting TypeDecorators in this way would really help with something like a numpy array or similar (without necessarily having to build in each of the types that would be converted). Thanks in advance for the help!

stefanw commented 4 years ago

On insert/update types are inferred to create columns if they do not exist. There's no fancy type conversion going on when getting data out. This is out of scope for dataset. You are already using sqlalchemy directly, so that's your best bet for this use case as well.

mjd3 commented 4 years ago

Sorry for the confusion; after looking into this some more, it seems like what I am talking about does work (kind of), but only if querying without closing the database. Here is a MWE.


import sqlalchemy as sa
import numpy as np
import dataset

class NumpyType(sa.types.TypeDecorator):
    impl = sa.types.LargeBinary

    def process_bind_param(self, value, dialect):
        return value.tobytes()

    def process_result_value(self, value, dialect):
        return np.frombuffer(value)

if __name__ == '__main__':
    db = dataset.connect('sqlite:///test.db')
    tab = db['tab']
    tab.insert_many([{'arr': np.random.rand(10)} for i in range(100)], types={'arr': NumpyType})
    assert isinstance(tab.find_one(id=2)['arr'], np.ndarray)
    db.close()
    del db
    del tab

    db = dataset.connect('sqlite:///test.db')
    tab = db['tab']
    assert isinstance(tab.find_one(id=2)['arr'], np.ndarray)

The first assert does not trigger, while the second does. Any ideas on this? As you can see, I'm not really using sqlalchemy, but the idea of using TypeDecorators to support more types for dataset would be really helpful!

mjd3 commented 4 years ago

Ah nevermind, I now realize this is just a general issue with TypeDecorators in sqlalchemy after some more testing. In case it's helpful for anyone, adding a listener of this type:

@sa.event.listens_for(sa.Table, "column_reflect")
def _setup_posetype(inspector, table, column_info):
    if isinstance(column_info["type"], sa.BLOB) and column_info['name'] == 'arr':
        column_info["arr"] = NumpyType()

Appreciate this repo!