ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.3k stars 596 forks source link

feat: Reuse the existing DB connection while creating backend #8877

Closed walnutist closed 3 months ago

walnutist commented 7 months ago

Is your feature request related to a problem?

Currently all db operations are done thru SQLAlchemy/SQLGlot, thus only DBAPI compliant interfaces are exposed. It is required to reuse an existing db connection, e.g., duckdb.connec("mydb.db"), so that all previously registered python function (via duckdb.create_function) to work, also, it would be easier to guarantee the in-memory duckdb to be shared among ibis and non-ibis codes

Describe the solution you'd like

Change the current connect method to accept an existing connection

What version of ibis are you running?

9.0.0.dev568

What backend(s) are you using, if any?

DuckDB, Polars, MySQL

Code of Conduct

chloeh13q commented 7 months ago

Hi @walnutist - thank you for checking out Ibis and filing an issue!

For our understanding - do you mind elaborating on what you're trying to do? Do you have a use case where Ibis does not provide certain functionalities, and therefore you have certain parts of code written without Ibis and certain parts with Ibis?

cpcloud commented 7 months ago

I think what @walnutist might be asking for is the ability to construct a backend from an existing Ibis connection.

walnutist commented 7 months ago

easier

Exactly. Thanks for your elaboration!

sh-rp commented 4 months ago

Hi @chloeh13q and ibis team, dlt (https://github.com/dlt-hub/dlt) core-dev here. We are exploring an ibis integration. dlt is a python library to easily load data to a bunch of destinations, so whenever a user configures a dlt pipeline, there will be the credentials and code for a working db connection in place. What would be amazing is to just share the connection with an ibis backend and then have the nice interface ibis provides to the dataset on that connection. I have hacked together a prototype here: https://github.com/dlt-hub/dlt/pull/1491 where I basically instantiate a backend and set the .con attribute, but that only works for a few destinations (duckdb, postgres and to a certain extent snowflake) and I think the reason is, that the backends do additional setup stuff in the do_connect (I think that is what it's called) function on the backend. So it would be amazing to be able to give a connection to the backend constructor or this connect method and this method would then do all the stuff it always does except for establishing a new connection.

Our use-case is, that we have our own credentials format and in some cases there are multiple ways of authenticating with a destination and we'd have to write a lot of glue code and tests just to make sure the credentials are passed in a way ibis understands in every case.

gforsyth commented 4 months ago

Hey @sh-rp -- thanks for getting in touch!

I think we can support this, where we have some class method on the backends like from_dbapi_con and we expect to be handed an open DBABI 2.0 connection that we can then massage accordingly so that Ibis behaves as expected.

Most of the things that we do on class instantiation are easy-enough to apply to an already-open connection.

One thing that we currently do is grab all of the args and kwargs that will be passed to the dbapi con (when Ibis is creating it) and stash those, to enable the reconnect functionality. That bit of functionality would be harder to manage when we're being handed an existing connection. If we don't need reconnect, I think this is something we can put on the roadmap for 10.0

sh-rp commented 4 months ago

@gforsyth in my view we do not need reconnect, the connection would be handled by dlt. There is a chance that this will not work well for users that need a connection that is open for a long time, but I think for the first iteration it should be fine. In an ideal scenario, your backends would be built in a way, that additional third party dependencies are only required when no open dbapi connection is handed over. I think this should be possible, since in most cases you'd only need that when establishing the connection yourselves. I may be wrong there through.

I'll just track this issue then and react to it in case you decide to make this change :)

FYI the destinations we currently support are

We would like to expose a connected ibis backend for each of the destinations listed and supported by Ibis.

deepyaman commented 4 months ago

I have hacked together a prototype here: https://github.com/dlt-hub/dlt/pull/1491 where I basically instantiate a backend and set the .con attribute, but that only works for a few destinations (duckdb, postgres and to a certain extent snowflake) and I think the reason is, that the backends do additional setup stuff in the do_connect (I think that is what it's called) function on the backend. So it would be amazing to be able to give a connection to the backend constructor or this connect method and this method would then do all the stuff it always does except for establishing a new connection.

@sh-rp I can try and take this. @gforsyth If it sounds right to you, I will take the approach you mention—add a class method and just do the post-connect steps from do_connect (probably refactor it out into a _post_connect method on the backend).