Open tswast opened 3 years ago
Thank you for your patience!
We've started work on an SQLAlchemy connector and will continue with work for a couple of quarters. As mentioned, this should cover your use case.
Thank you for your patience!
We've started work on an SQLAlchemy connector and will continue with work for a couple of quarters. As mentioned, this should cover your use case.
I just stumbled across this. I've wanted this for a long time and it would be a game changer for my company. Is there an issue we can track for updates on this development?
@daniellehanks Thank you for sharing your interest in this work! The SQLAlchemy connector work is being done here. You can follow the progress there. As noted in the README, it is still under production and is not ready for production use.
@larkee @vi3k6i5 : Given python-spanner-SQLAlchemy is now GA, is this use case covered?
It'd be good to check that it does indeed work with pandas.read_sql in a code sample or something. I would expect it to work, though.
Since Spanner is row-oriented, I don't see there being all that much of a performance reason to avoid SQLAlchemy (compared to BigQuery which is column-oriented).
@IlyaFaer Can you check for this?
Is your feature request related to a problem? Please describe.
I'd like to be able to run a query against a Spanner database and download (possibly large-ish -- MBs to GBs) results to a pandas DataFrame. Specifically, I'd like to eventually use this as a component in an ibis connector, but it'd also be useful for general data processing pipelines.
Describe the solution you'd like
It seems that StreamedResultSet is the most natural place to put a
to_dataframe
method, similar to the RowIterator.to_dataframe method in the BigQuery client library.Since
pandas
needn't be required to use this client library, the import should be conditionalhttps://github.com/googleapis/python-bigquery/blob/fb401bd94477323bba68cf252dd88166495daf54/google/cloud/bigquery/table.py#L29-L32
and the dependency listed in "extras".
https://github.com/googleapis/python-bigquery/blob/fb401bd94477323bba68cf252dd88166495daf54/setup.py#L50
Describe alternatives you've considered
It's possible this is simpler than realized, so maybe could just be a code sample.
If there were a SQLAlchemy connector (a much bigger project than read-only pandas dataframe), then pandas support is basically free via
pandas.read_sql
.Additional context
Related StackOverflow questions: