googleapis / python-spanner

Apache License 2.0
135 stars 85 forks source link

feature request: pandas connector #155

Open tswast opened 3 years ago

tswast commented 3 years ago

Is your feature request related to a problem? Please describe.

I'd like to be able to run a query against a Spanner database and download (possibly large-ish -- MBs to GBs) results to a pandas DataFrame. Specifically, I'd like to eventually use this as a component in an ibis connector, but it'd also be useful for general data processing pipelines.

Describe the solution you'd like

It seems that StreamedResultSet is the most natural place to put a to_dataframe method, similar to the RowIterator.to_dataframe method in the BigQuery client library.

Since pandas needn't be required to use this client library, the import should be conditional

https://github.com/googleapis/python-bigquery/blob/fb401bd94477323bba68cf252dd88166495daf54/google/cloud/bigquery/table.py#L29-L32

and the dependency listed in "extras".

https://github.com/googleapis/python-bigquery/blob/fb401bd94477323bba68cf252dd88166495daf54/setup.py#L50

Describe alternatives you've considered

It's possible this is simpler than realized, so maybe could just be a code sample.

If there were a SQLAlchemy connector (a much bigger project than read-only pandas dataframe), then pandas support is basically free via pandas.read_sql.

Additional context

Related StackOverflow questions:

larkee commented 3 years ago

Thank you for your patience!

We've started work on an SQLAlchemy connector and will continue with work for a couple of quarters. As mentioned, this should cover your use case.

daniellehanks commented 3 years ago

Thank you for your patience!

We've started work on an SQLAlchemy connector and will continue with work for a couple of quarters. As mentioned, this should cover your use case.

I just stumbled across this. I've wanted this for a long time and it would be a game changer for my company. Is there an issue we can track for updates on this development?

larkee commented 3 years ago

@daniellehanks Thank you for sharing your interest in this work! The SQLAlchemy connector work is being done here. You can follow the progress there. As noted in the README, it is still under production and is not ready for production use.

ansh0l commented 2 years ago

@larkee @vi3k6i5 : Given python-spanner-SQLAlchemy is now GA, is this use case covered?

tswast commented 2 years ago

It'd be good to check that it does indeed work with pandas.read_sql in a code sample or something. I would expect it to work, though.

Since Spanner is row-oriented, I don't see there being all that much of a performance reason to avoid SQLAlchemy (compared to BigQuery which is column-oriented).

asthamohta commented 2 years ago

@IlyaFaer Can you check for this?