googleapis / python-bigquery

Apache License 2.0
719 stars 289 forks source link

DB-API should have option to use Session API (preview) in connection and/or cursor #971

Open tswast opened 2 years ago

tswast commented 2 years ago

We'll get more details later, but https://cloud.google.com/bigquery/docs/sessions-intro recently released. This would map well to user expectations of DB-API objects, but could be considered a breaking change.

tswast commented 2 years ago

Assigning myself for now, since this will require some design work.

durey commented 2 years ago

Hi, just to check: there is currently no way of making use of temp tables in BigQuery (from this python library) until this is implemented, right? Are there any plans in the roadmap? I see it was even removed from the v3 to do list...

tswast commented 2 years ago

@durey It's possible to use temp tables by sending a single query request with multiple queries separated by semicolons. But it would be more like other systems if we were to implement this. See also the discussion on https://github.com/googleapis/python-bigquery/issues/377

tswast commented 2 years ago

Note: I removed the "semver: major" as I think we might be able to add this as an option when creating a "connection" object.

chalmerlowe commented 10 months ago

See the discussion on https://github.com/googleapis/python-bigquery/issues/377 for context and background.

r1b commented 2 months ago

Checking in - is this still relevant? Its possible to pass a job_config when calling dbapi cursor execute, which could be set to create a session.

For context, I work on a system that extends the SQLAlchemy engine in a way that allows us to control the arguments passed to dbapi cursor execute. I'm exploring porting some older multi-statement queries to use this system, possibly via sessions. Wanted to confirm that creating a session this way is directionally aligned with the interface here.

tswast commented 2 months ago

Custom job_config should work to get things started, but for that I think you'll want to create the session separately with a SELECT 1 query and pass in the session ID to the custom job_config.

My thoughts for this issue would be to:

r1b commented 2 months ago

Got it, thanks - couple of thoughts there:

There is a currently a bit of an impedance mismatch when mapping SQLAlchemy Connections (which eventually invoke the corresponding DBAPI concept) onto BQ. In SQLAlchemy, these were designed to handle things like connection pooling that don't have a clear equivalent in BQ space. This connection pooling is enabled by default (in QueuePool).

So depending on how the user's interface is configured upstream of the DBAPI, it may not be safe to assume that e.g: a connection is closed after some set of queries run. [1]

Before SQLAlchemy v2, its also possible to use a "connectionless" execution style, which makes it unsafe to assume that a set of queries use the same connection. [2]


It seems possible to avoid these problems with the right SQLAlchemy configuration:

But wanted to give a heads up that these conditions will not necessarily be true for existing users who access DBAPI via SQLAlchemy / the BigQuery SQLAlchemy dialect.

[1] https://docs.sqlalchemy.org/en/20/core/connections.html#basic-usage [2] https://docs.sqlalchemy.org/en/13/core/connections.html#connectionless-execution-implicit-execution [3] https://docs.sqlalchemy.org/en/20/core/pooling.html#sqlalchemy.pool.NullPool