googleapis / python-bigquery-storage

Apache License 2.0
114 stars 46 forks source link

collate is not supported by Storage API #592

Open nuevoleonkx opened 1 year ago

nuevoleonkx commented 1 year ago

I'm using Python BigQuery API to perform queries and create some extracts. Everything went fine, except when I made a change on the schema to add collate (case ignore) to string columns, the export stops to work. https://cloud.google.com/bigquery/docs/reference/standard-sql/collation-concepts

status = StatusCode.FAILED_PRECONDITION details = "request failed: column 'Fact' is unsupported by the Storage API" debug_error_string = "UNKNOWN:Error received from peer ipv4:x.x.x.x {grpc_message:"request failed: column \'Fact\' is unsupported by the Storage API", grpc_status:9, created_time:"2023-04-26T01:47:14.964649603-04:00"}"

Environment details

Steps to reproduce

  1. Apply collate to a string column
  2. Create a query to select the column with collate and try to store as a dataframe

Code example

from google.cloud import bigquery
from google.oauth2 import service_account
PROJECT_ID = 'your_project_id'
credentials = service_account.Credentials.from_service_account_file('service_account.key')
client = bigquery.Client(project=PROJECT_ID, credentials=credentials)
query = "select column_with_collate from schema.table"
df = client.query(query).to_dataframe()

Stack trace

Traceback (most recent call last):
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/flask/app.py", line 2548, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
    response = self.handle_exception(e)
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/flask_simpleldap/__init__.py", line 299, in wrapped
    return func(*args, **kwargs)
  File "/home/lortega/projects/payer-dashboad/app/api.py", line 1395, in test_dynamic
    df = client.query(query).to_dataframe(
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/google/cloud/bigquery/job/query.py", line 1800, in to_dataframe
    return query_result.to_dataframe(
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/google/cloud/bigquery/table.py", line 2151, in to_dataframe
    record_batch = self.to_arrow(
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/google/cloud/bigquery/table.py", line 1821, in to_arrow
    for record_batch in self.to_arrow_iterable(
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/google/cloud/bigquery/table.py", line 1684, in _to_page_iterable
    yield from result_pages
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 885, in _download_table_bqstorage
    session = bqstorage_client.create_read_session(
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/google/cloud/bigquery_storage_v1/services/big_query_read/client.py", line 635, in create_read_session
    response = rpc(
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/google/api_core/gapic_v1/method.py", line 113, in __call__
    return wrapped_func(*args, **kwargs)
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/google/api_core/retry.py", line 349, in retry_wrapped_func
    return retry_target(
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/google/api_core/retry.py", line 191, in retry_target
    return target()
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/google/api_core/timeout.py", line 120, in func_with_timeout
    return func(*args, **kwargs)
  File "/home/lortega/projects/payer-dashboad/venv/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 74, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.FailedPrecondition: 400 request failed: column 'column_with_collate' is unsupported by the Storage API

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

shollyman commented 1 year ago

Had a chance to look at this. It's a rejection from the Storage API service. My understanding is that support for this is in the process of rolling out, but I don't have an ETA currently. Filed internal issue 279926235 to get more details.