ibis-project / ibis-bigquery

BigQuery backend for Ibis
Apache License 2.0
19 stars 18 forks source link

Ibis 2.0: Can't connect to table - _parse_project_and_dataset called without database #107

Open saschahofmann opened 2 years ago

saschahofmann commented 2 years ago

I am trying to update our codebase to ibis 2.0 and started getting the ValueError("Unable to determine BigQuery dataset.") error when trying to fetch a table from BigQuery like this ibis_client.table("table", database="dataset").

I followed the traceback like this, starting at /ibis_bigquery/init.py, line 177

def table(self, name, database=None) -> ir.TableExpr:
    t = super().table(name, database=database)
    ...

This calls the standard ibis table function /ibis/backends/base/sql/init.py line 41

def table(self, name, database=None):
    qualified_name = self._fully_qualified_name(name, database)
    schema = self.get_schema(qualified_name)
    ...

which in turn calls the get_schema method in bigquery /ibis_bigquery/init.py, line 329 BUT only with a name (this will be important in a sec)

def get_schema(self, name, database=None):
    table_id = self._fully_qualified_name(name, database)
    ...

Now we are trying to create the fully_qualified (/ibis_bigquery/init.py", line 183) name again but this time database is None!

def _fully_qualified_name(self, name, database):
    default_project, default_dataset = self._parse_project_and_dataset(database)
    ...

Which leads us to _parse_project_and_dataset in /ibis_bigquery/init.py, line 161

 def _parse_project_and_dataset(self, dataset) -> Tuple[str, str]:
    if not dataset and not self.dataset:
        raise ValueError("Unable to determine BigQuery dataset.")
    ...

As mentioned earlier dataset is now None and nowhere in the stacktrace above did we set self.dataset, I probably can set the dataset before calling the table method but it feels to me like this is a bug?

This code also means that the bigquery table method makes two requests to bigquery one in the table method itself and one in the get_schema method that is called via the super method. Maybe we could make bq_table a cached property to avoid that?

saschahofmann commented 2 years ago

Right now, I am monkeypatching the class like this

from ibis.backends.base.sql import BaseSQLBackend
from ibis_bigquery import Backend

original_table = BaseSQLBackend.table

def table(self, name, database=None):
    self.set_database(database)
    return original_table(self, name, database)

Backend.table = table

which makes me wonder why the Bigquery Backend class has it's own table method?

jayceslesar commented 2 years ago

@saschahofmann I'm curious is there a difference between the syntax at the top vs

import ibis
import ibis_bigquery

conn = ibis_bigquery.connect(
    project_id='some_project',
    dataset_id='some_dataset'
)

table = conn.table('some_table')
# can compose queries on table now