Open fergu opened 5 years ago
I found the solution from this link
Essentially, np.ndarray needs something to convert the array in to something that sqlite can actually handle as it has no array datatype, and apparently something goes wrong when trying to buffer the BLOB'd array directly. So, instead, convert the whole mess to text using numpy's save functionality.
Taking the code from the above link and attaching it to my (now working) code from above.
import sqlalchemy
import sqlite3
import numpy as np
import io
def adapt_array(arr):
"""
http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
"""
out = io.BytesIO()
np.save(out, arr)
out.seek(0)
return sqlite3.Binary(out.read())
def convert_array(text):
out = io.BytesIO(text)
out.seek(0)
return np.load(out)
# Converts np.array to TEXT when inserting
sqlite3.register_adapter(np.ndarray, adapt_array)
# Converts TEXT to np.array when selecting
sqlite3.register_converter("array", convert_array)
engine = sqlalchemy.create_engine('sqlite:///:memory:',connect_args={'detect_types':sqlite3.PARSE_DECLTYPES})
This feels like something that should maybe be default behavior for Panda's underlying communication with sqlalchemy (or maybe sqlalchemy's communication with sqlite3)?
Just to add on to my particular issue here - I am finding that reading back does not seem to work properly if everything is done as I put above. I can save to the database, but trying to read back I just get binary data.
I confirmed that the converter does not fire when the column is read back. The reason for this is that it appears sqlalchemy
creates the table using whatever appropriate data type that is natively supported (TEXT
in this case) and ignores whether a custom adapter was used. The only way (as far as I know) to get the converter to fire when reading from the table is to create the table yourself. I.E
engine = sqlalchemy.create_engine('sqlite:///:memory:',connect_args={'detect_types':sqlite3.PARSE_DECLTYPES},echo=True)
con = engine.connect()
con.execute("create table dftest (\"index\" BIGINT, name TEXT, start FLOAT, \"end\" FLOAT, channel1 array, channel2 array, channel3 array, channel4 array)") # NOTE: 'index' column needs to be included for the Pandas functions for SQLite to work
After this the pandas functions can be used normally as long as the if_exists
argument is set to 'append' as otherwise the declared types will be forgotten and the converter will never fire.
This may end up being an issue for the folks working on sqlalchemy rather than Pandas it appears, though.
Okay, another update for anyone who stumbles across this in the future with a similar problem.
It seems to be that this is not (easily/obviously) possible when using the sqlalchemy engine since something seems to get lost on its way in/out of there. I can get a custom type (as outlined here) process_bind_param
call to fire and store appropriately, but I can't get process_result_value
to fire during the read. I've tried using the @compiles
decorator, which does affect the CREATE TABLE command (I.E so I get channel1 mycustomtype
), but even then the result
command never fires during pd.read_sql_table
. I spent some time working my way through the code to see where things got lost, but never managed to find exactly where things went missing or why my class never fired during a read. I did not try using sqlalchemy
's direct functionality to read/write from the database to see if this was a pandas problem or not, though.
On the other hand, I have gotten this working using some tweaks on what I've posted above. Basically, the code in my earlier replies seems to work as long as an sqlite3 connection is passed in instead of an sqlalchemy engine.
I've left the (now commented out) parts of what I tried using sqlalchemy
's procedures in case anyone wants to try and chew on why it wasn't working.
import pandas as pd
pd.__version__
import os
import sys
# import sqlalchemy
# import sqlalchemy.types as types
# from sqlalchemy.ext.compiler import compiles
import sqlite3
import numpy as np
import io
#
# class arraytodb(types.TypeDecorator):
# impl = types.Text # This is needed to indicate what "base type" we're using
# def process_bind_param(self, value, dialect):
# print("Binding")
# return "a"
# # out = io.BytesIO()
# # np.save(out, value)
# # out.seek(0)
# # return sqlite3.Binary(out.read())
#
# def process_result_value(self, value, dialect):
# print("Process result")
# print(value)
# print(dialect)
# return "b"
# # return value
#
# @compiles(arraytodb)
# def arraycompiler(type_, compiler, **kw):
# return "arraytodb"
def adapt_array(arr):
"""
http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
"""
out = io.BytesIO()
np.save(out, arr)
out.seek(0)
return sqlite3.Binary(out.read())
def convert_array(text):
out = io.BytesIO(text)
out.seek(0)
return np.load(out)
# Converts np.array to TEXT when inserting
sqlite3.register_adapter(np.ndarray, adapt_array)
# Converts TEXT to np.array when selecting
sqlite3.register_converter("arraytodb", convert_array)
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
# engine = sqlalchemy.create_engine('sqlite:///:memory:', connect_args={'detect_types':sqlite3.PARSE_DECLTYPES},echo=True)
allPts = []
for i in range(0,30):
name = "R"+str(i).zfill(3)
start = np.random.rand()
end = np.random.rand()
data = np.random.random((3000,4))
thisDict = {'name':str(name),'start':start,'end':end,'channel1':data[:,0],'channel2':data[:,1],'channel3':data[:,2],'channel4':data[:,3]}
allPts.append(pd.Series(thisDict))
asDF = pd.DataFrame(allPts)
asDF.to_sql('dftest',con,if_exists='replace',dtype={'channel1':"arraytodb",'channel2':"arraytodb",'channel3':"arraytodb",'channel4':"arraytodb"})
print(pd.read_sql('SELECT * from dftest',con))
I am trying out using pandas+sqlalchemy (specifically sqlite) as a means to store my data. In my use case, I will have a number of rows where most will be obvious types such as string, int, float, etc - but I will want to store a few numpy ndarrays. In the example below, each 'channel' entry is a 1x3000 array of floats. For higher dimensional arrays (at most 2D) I plan to store the shape of the data alongside the flattened array, so this is all I need.
Code Sample, a copy-pastable example if possible
This was run in Jupyter Lab.
This produces the traceback
Problem description
I notice that the bottom of the traceback specifies the error started in sqlalchemy - though for all my googling I can't find any record of this problem (or error message) anywhere either with sqlalchemy or pandas. I'm assuming this means I probably am doing something wrong, but I'm not seeing it when looking at docs etc. Storing a numpy array as a binary blob makes sense to me, so I feel like the type inference makes sense.
Expected Output
Writing to the database/no error. :)
Output of
pd.show_versions()