Closed Mjboothaus closed 2 years ago
Hello 👋
It's hard to be sure from your example, but it looks a bit like params=("SydneyOceanBeaches", )
is passed inside the SQL string itself, rather than as a separate argument to the query
function.
So it looks like you're trying:
from sqlite_s3_query import sqlite_s3_query
with sqlite_s3_query(url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite') as query:
with query('SELECT "Beach name" FROM beaches WHERE Region = ?, params=("SydneyOceanBeaches", )') as (columns, rows):
for row in rows:
print(row)
when it should be something like:
from sqlite_s3_query import sqlite_s3_query
with sqlite_s3_query(url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite') as query:
with query('SELECT "Beach name" FROM beaches WHERE Region = ?', params=("SydneyOceanBeaches", )) as (columns, rows):
for row in rows:
print(row)
G'day Michal
Thanks for you prompt reply -- yes I was mistakenly passing the params within the SQL query string. It is working well now.
Any thoughts on returning the (columns, rows) as a pandas dataframe? :)
Any thoughts on returning the (columns, rows) as a pandas dataframe? :)
Like this?
import pandas as pd
# ...
with query('SELECT "Beach name" FROM beaches WHERE Region = ?', params=("SydneyOceanBeaches", )) as (columns, rows):
df = pd.DataFrame(rows, columns=columns)
Oh just realised - were you asking about changing the API of sqlite-s3-query to return a pandas DataFrame? If so: I'm fairly anti:
Right now the dependencies are as minimal as I can make them. I wouldn't want to add to them if I can at all help it (and I've been pondering even if I can remove the http client somehow...)
The API is also a "streaming first" API. By returning an iterable of rows, it means streaming processing, i.e. processing rows while data is still being fetched from S3, and avoiding loading all the results in memory at once, is possible. This doesn't happen for all queries, admittedly, but it's a property I would like to preserve, since I personally use it in another project.
And as above, if someone wants a pandas dataframe and all data in memory at once, it's a 1-liner: df = pd.DataFrame(rows, columns=columns)
, so even for fairly novice Pandas people, not too tricky.
And if the creation of the dataframe is in client code, the client can pass as many other options to the constructor as they like. There isn't any sort of "hiding" of Pandas features, or making assumptions about what sort of dataframe is needed, since it doesn't use any pandas at all, or then having to copy the data into another dataframe that is more suitable.
Anyway, suspect in any case this is beyond the scope of this particular issue. Closing, but feel free to open another.
Example query syntax:
'SELECT "Beach name" FROM beaches WHERE Region = ?, params=("SydneyOceanBeaches", )'
[ISSUE reporting in progress - add environment details & error message]