Open rjrudin opened 2 years ago
Not sure if this is relevant - using SQLAlchemy 1.4.25 and sqlalchemy-bigquery 1.3.0, when I run this:
from sqlalchemy_bigquery import BigQueryDialect
str(sql.compile(dialect=BigQueryDialect))
I get this error:
AttributeError: type object 'BigQueryDialect' has no attribute 'positional'
I tried using "setattr" to fake-fix that, but I get several other errors about attributes missing. Not sure if that would work, and so I've stuck with just doing compile(self.engine)
or compile(bind=self.engine)
.
If this is really an issue for https://github.com/googleapis/python-bigquery-sqlalchemy , I'm happy to move it there too.
This is a tough one. I think it does belong here, as the SQLAlchemy connector needs to continue to generate SQL that can be understood by the google.cloud.bigquery.dbapi
modules.
I would like to allow this library to take SQLAlchemy-generated SQL somehow, possibly via an optional dependency on SQLAlchemy so pandas-gbq can run the compile itself.
I assume it's built this way because DBAPI doesn't include a BigQuery-compatible parameter substitution style, but the sqlalchemy-bigquery
dialect compiles SQL into the pyformat style, then at runtime transforms the pyformatted query into a BigQuery compatible one and builds params that can be executed against a bigquery.Client
. Pretty confusing, but I pulled out the relevant pieces into a function that does the conversion
from google.cloud.bigquery.dbapi._helpers import to_query_parameters
from google.cloud.bigquery.dbapi.cursor import _format_operation
query = select(...) # SQLAlchemy query
compiled = query.compile(engine) # engine using BigQueryDialect or dialect=BigQueryDialect
parameters = compiled.params
bq_query, parameter_types = _format_operation(str(compiled), parameters)
job_config = bigquery.QueryJobConfig(
query_parameters=to_query_parameters(parameters, parameter_types)
)
# client = bigquery.Client(...)
query_job = client.query(bq_query, job_config)
This is pretty boiled down and doesn't address any edge cases, so YMMV, but it works for the relatively small scope of queries we expect.
Is your feature request related to a problem? Please describe.
I'm back again, after submitting #474 and #476, as we're trying to shift from using pd.read_sql_query to using pandas_gbq.read_gbq .
Current issue is that when we construct our SQLAlchemy Selectable, we may toss a row limit onto it based on user input, e.g:
pands_gbq.read_gbq wants a str as its query argument. So we're trying to compile the Selectable and serialize it as a string:
However, SQLAlchemy serializes its named parameters in a way that I can't get to work with pandas_gbq (and more generally, that won't work with BigQuery):
I found an example in the pandas_gbq tests of how to configure query/queryParameters in the "configuration" dict that can be passed to read_gbq, but that of course doesn't help because BigQuery has its own way of naming parameters.
Describe the solution you'd like
Generally, I need a way (hopefully it exists already and I'm just missing it!) of serializing a SQLAlchemy Selectable into a string query that can be the input to read_gbq. For now, just being able to handle the limit() parameter above would suffice. I of course am trying to avoid string concatenation and am hoping there's some way I can configure the "configuration" dict to make this work.
Describe alternatives you've considered
N/A
Additional context
N/A