googleapis / python-bigquery-sqlalchemy

SQLAlchemy dialect for BigQuery
MIT License
429 stars 129 forks source link

Repeated Type support for Structs #376

Open stanzheng opened 2 years ago

stanzheng commented 2 years ago

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Describe the solution you'd like

Ability to insert repeated nested STRUCT records. attribute to define repeated nested records

https://cloud.google.com/bigquery/docs/nested-repeated

TASK_SCHEMA = Table(
    "TASK_SCHEMA",
    meta,
 Column('updated', String(), table=None),
 Column('updater', String(), table=None),
 Column('url', String(), table=None),
 Column('verification_status', String(), table=None),
 Column('how_tos', STRUCT(title=String(), id=String(), 
url=String(), slug=String(), text=String()), table=None, **REPEATED=True**)),
)

Structs should support the bigquery REPEATED attribute on records.

A clear and concise description of what you want to happen. Describe alternatives you've considered

Serializing Records as Strings to deserialize in the client.

A clear and concise description of any alternative solutions or features you've considered. Additional context

tried using Arrays in conjunction with Structs

(
 Column('updater', String(), table=None),
 Column('url', String(), table=None),
 Column('verification_status', String(), table=None),
 Column('how_tos', 

ARRAY('how_tos', as_tuple=
Column('how_tos', STRUCT(title=String(), id=String(), url=String(), slug=String(), text=String()), table=None)), table=None)

error stacktrace

  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py", line 2938, in visit_create_column
    text = self.get_column_specification(column, first_pk=first_pk)
  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy_bigquery/base.py", line 606, in get_column_specification
    colspec = super(BigQueryDDLCompiler, self).get_column_specification(
  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py", line 3137, in get_column_specification
    + self.dialect.type_compiler.process(
  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py", line 403, in process
    return type_._compiler_dispatch(self, **kw)
  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy/sql/visitors.py", line 96, in _compiler_dispatch
    return meth(self, **kw)
  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy_bigquery/base.py", line 561, in visit_ARRAY
    return "ARRAY<{}>".format(self.process(type_.item_type, **kw))
  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py", line 403, in process
    return type_._compiler_dispatch(self, **kw)
AttributeError: 'str' object has no attribute '_compiler_dispatch'

Add any other context or screenshots about the feature request here.

jdimatteo commented 2 years ago

This appears to already be supported. If I'm missing something, please let me know so I don't run into a problem soon!

For example, with sqlalchemy-bigquery 1.4.4:

import sqlalchemy
import sqlalchemy.ext.declarative
import sqlalchemy_bigquery
import os

Base = sqlalchemy.ext.declarative.declarative_base()

class Foo(Base):
    __tablename__ = 'foo'

    zap = sqlalchemy.Column(sqlalchemy.String, primary_key=True)

    bar = sqlalchemy.Column(
        sqlalchemy.ARRAY(
            sqlalchemy_bigquery.STRUCT(
                a=sqlalchemy.String,
                b=sqlalchemy.String,
                c=sqlalchemy.ARRAY(
                    sqlalchemy_bigquery.STRUCT(
                        x=sqlalchemy.String,
                        y=sqlalchemy.String)))))

engine = sqlalchemy.create_engine(f'bigquery://{os.environ["BQ_PROJECT_ID"]}/{os.environ["BQ_DATASET_ID"]}')
Base.metadata.create_all(engine)

This successfully creates a table with repeated structs: image

And here is a more verbose example that includes inserts that I tested works: https://gist.github.com/jdimatteo/c223668095392e629ca01aadada4171c

jlynchMicron commented 1 year ago

Can support be added for representing all the sub-columns of an unnested array of structs? I am having to manually specify the columns that exist in an unnested array of structs instead of SQLalchemy "knowing" the columns that live in the unnested array of structs.

Also, this BigQuery documentation should probably be updated, since it mentions nothing about an unnested array of structs forming their own new columns in the result. Documentation: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#query_structs_in_an_array

Example (Incorrect result, participants results in 2 columns):

WITH Races AS (
  SELECT "800M" AS race,
    [STRUCT("Rudisha" AS name, [23.4, 26.3, 26.4, 26.1] AS laps),
     STRUCT("Makhloufi" AS name, [24.5, 25.4, 26.6, 26.1] AS laps),
     STRUCT("Murphy" AS name, [23.9, 26.0, 27.0, 26.0] AS laps),
     STRUCT("Bosse" AS name, [23.6, 26.2, 26.5, 27.1] AS laps),
     STRUCT("Rotich" AS name, [24.7, 25.6, 26.9, 26.4] AS laps),
     STRUCT("Lewandowski" AS name, [25.0, 25.7, 26.3, 27.2] AS laps),
     STRUCT("Kipketer" AS name, [23.2, 26.1, 27.3, 29.4] AS laps),
     STRUCT("Berian" AS name, [23.7, 26.1, 27.0, 29.3] AS laps)]
       AS participants)
SELECT
  race,
  participant
FROM Races AS r
CROSS JOIN UNNEST(r.participants) AS participant;

+------+---------------------------------------+
| race | participant                           |
+------+---------------------------------------+
| 800M | {Rudisha, [23.4, 26.3, 26.4, 26.1]}   |
| 800M | {Makhloufi, [24.5, 25.4, 26.6, 26.1]} |
| 800M | {Murphy, [23.9, 26, 27, 26]}          |
| 800M | {Bosse, [23.6, 26.2, 26.5, 27.1]}     |
| 800M | {Rotich, [24.7, 25.6, 26.9, 26.4]}    |
| 800M | {Lewandowski, [25, 25.7, 26.3, 27.2]} |
| 800M | {Kipketer, [23.2, 26.1, 27.3, 29.4]}  |
| 800M | {Berian, [23.7, 26.1, 27, 29.3]}      |
+------+---------------------------------------+

Actual Result: image