googleapis / python-bigquery-sqlalchemy

SQLAlchemy dialect for BigQuery
MIT License
434 stars 129 forks source link

Buggy metadata reflection without a default dataset #1088

Open JacobHayes opened 4 months ago

JacobHayes commented 4 months ago

The get_table_names and get_view_names methods are supposed to return the bare names (no {schema}. prefix) of the resources for a single schema/dataset (where schema=None is the "default schema"). However, the BigQueryDialect implementation returns:

The bolded behaviors, which arise when the connection doesn't have a default dataset, are incorrect and trigger some edge cases:

Environment details

Steps to reproduce

  1. Run the script below
  2. At the breakpoint, notice that Base.metadata.tables has tables from all datasets
  3. Notice that the script errors with NoSuchTableError even though the table exists

Code example

from os import getenv

from sqlalchemy import create_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column

dataset_1, dataset_2 = "tmp1", "tmp2"  # bq query "create schema tmp1; create schema tmp2;"
project = getenv("GOOGLE_CLOUD_PROJECT")

engine = create_engine(f"bigquery://{project}")

class Base(DeclarativeBase):
    pass

class User(Base):
    __tablename__ = "user"
    __table_args__ = {"schema": dataset_1}

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str]
    fullname: Mapped[str]
    nickname: Mapped[str | None]

class Org(Base):
    __tablename__ = "org"
    __table_args__ = {"schema": dataset_2}

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str]

Base.metadata.create_all(engine)
Base.metadata.clear()

Base.metadata.reflect(engine)
# NOTE: Base.metadata.tables *should* be empty (no schema nor a connection default), but is actually populated from all datasets
breakpoint()

# NOTE: This will confusingly raise `sqlalchemy.exc.NoSuchTableError: {schema}.{table}` - it's actually
# looking for `{schema}.{table}` _within_ `schema`.
Base.metadata.reflect(engine, schema=dataset_1)

Stack trace

$ python3 x.py
(Pdb) len(Base.metadata.tables)
15
(Pdb) c
Traceback (most recent call last):
  File "/.../x.py", line 51, in <module>
    Base.metadata.reflect(engine, schema=dataset_1)
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 5843, in reflect
    Table(name, self, **reflect_opts)
  File "<string>", line 2, in __new__
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/util/deprecations.py", line 281, in warned
    return fn(*args, **kwargs)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 432, in __new__
    return cls._new(*args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 486, in _new
    with util.safe_reraise():
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 482, in _new
    table.__init__(name, metadata, *args, _no_init=False, **kw)
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 862, in __init__
    self._autoload(
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 894, in _autoload
    conn_insp.reflect_table(
  File "/[...]/.venv/lib/python3.12/site-packages/sqlalchemy/engine/reflection.py", line 1538, in reflect_table
    raise exc.NoSuchTableError(table_name)
sqlalchemy.exc.NoSuchTableError: tmp1.user