Closed brentgunderson closed 1 year ago
Hi @brentgunderson 👋🏻 !
This looks like a bug in the SQL we're generating.
For now, you can work around it by writing
.join(games_wide, "gameId")
instead of using the equality predicate.
We're cutting 5.1 next Tuesday, so a fix for this might make it in but it'll definitely be fixed in 6.0!
Hey @brentgunderson -- I believe this has been fixed by our recent work on join column renaming. Although the code as written above will no longer work, because it will result in column naming collisions (since our default is to rename duplicate columns to {name}_right
).
This works:
[ins] In [1]: import ibis
[ins] In [2]: from ibis import _
[ins] In [3]: con = ibis.bigquery.connect()
[ins] In [4]: schedules = con.table("bigquery-public-data.baseball.schedules")
[ins] In [5]: games_wide = con.table("bigquery-public-data.baseball.games_wide")
[ins] In [6]: games_post_wide = con.table("bigquery-public-data.baseball.games_post_wide")
[ins] In [7]: expr = (
...: schedules.join(
...: games_post_wide, schedules.gameId == games_post_wide.gameId, rname="{name}_gpw"
...: )
...: .join(games_wide, _.gameId == games_wide.gameId, rname="{name}_gw")
...: .select(_.gameNumber)
...: .distinct()
...: )
[ins] In [8]: expr.execute(limit=10)
Out[8]:
Empty DataFrame
Columns: [gameNumber]
Index: []
Note the result is empty because there is no overlap in gameId
between games_wide
and games_post_wide
Without passing arguments, you'll get an integrity error due to the name collisions:
[ins] In [10]: schedules.join(games_post_wide, schedules.gameId == games_post_wide.gameId).join(
...: games_wide, _.gameId == games_wide.gameId
...: )
---------------------------------------------------------------------------
IntegrityError Traceback (most recent call last)
Cell In[10], line 1
----> 1 schedules.join(games_post_wide, schedules.gameId == games_post_wide.gameId).join(
2 games_wide, _.gameId == games_wide.gameId
3 )
File ~/github.com/ibis-project/ibis/ibis/expr/types/relations.py:2480, in Table.join(left, right, predicates, how, lname, rname)
2477 if how in ("semi", "anti"):
2478 return expr
-> 2480 return ops.relations._dedup_join_columns(expr, lname=lname, rname=rname)
File ~/github.com/ibis-project/ibis/ibis/expr/operations/relations.py:627, in _dedup_join_columns(expr, lname, rname)
618 # Rename columns in the right table that overlap, dropping any columns that
619 # are known to be equal to those in the left table
620 right_projections = [
621 right[column].name(rname.format(name=column) if rname else column)
622 if column in overlap
(...)
625 if column not in equal
626 ]
--> 627 return expr.select(left_projections + right_projections)
File ~/github.com/ibis-project/ibis/ibis/expr/types/relations.py:1735, in Table.select(self, *exprs, **named_exprs)
1730 if not exprs:
1731 raise com.IbisTypeError(
1732 "You must select at least one column for a valid projection"
1733 )
-> 1735 op = an.Projector(self, exprs).get_result()
1737 return op.to_expr()
File ~/github.com/ibis-project/ibis/ibis/expr/analysis.py:542, in Projector.get_result(self)
539 if fused_op is not None:
540 return fused_op
--> 542 return ops.Selection(self.parent, self.clean_exprs)
File ~/github.com/ibis-project/ibis/ibis/common/grounds.py:25, in BaseMeta.__call__(cls, *args, **kwargs)
24 def __call__(cls, *args, **kwargs) -> Base:
---> 25 return cls.__create__(*args, **kwargs)
File ~/github.com/ibis-project/ibis/ibis/common/grounds.py:96, in Annotable.__create__(cls, *args, **kwargs)
92 @classmethod
93 def __create__(cls, *args, **kwargs) -> Annotable:
94 # construct the instance by passing the validated keyword arguments
95 kwargs = cls.__signature__.validate(*args, **kwargs)
---> 96 return super().__create__(**kwargs)
File ~/github.com/ibis-project/ibis/ibis/expr/operations/relations.py:385, in Selection.__init__(self, table, selections, predicates, sort_keys, **kwargs)
382 elif not shares_some_roots(predicate, table):
383 raise com.RelationError("Predicate doesn't share any roots with table")
--> 385 super().__init__(
386 table=table,
387 selections=selections,
388 predicates=predicates,
389 sort_keys=sort_keys,
390 **kwargs,
391 )
File ~/github.com/ibis-project/ibis/ibis/common/grounds.py:112, in Annotable.__init__(self, **kwargs)
110 for name, field in self.__attributes__.items():
111 if isinstance(field, Attribute):
--> 112 if (value := field.initialize(self)) is not EMPTY:
113 object.__setattr__(self, name, value)
File ~/github.com/ibis-project/ibis/ibis/common/annotations.py:85, in Attribute.initialize(self, this)
83 return EMPTY
84 elif callable(self._default):
---> 85 value = self._default(this)
86 else:
87 value = self._default
File ~/github.com/ibis-project/ibis/ibis/expr/operations/relations.py:359, in Projection.schema(self)
356 names.extend(schema.names)
357 types.extend(schema.types)
--> 359 return sch.schema(names, types)
File /nix/store/cw2xxyh09alhbxywlvvibkhwsl6439ix-python3-3.10.11-env/lib/python3.10/site-packages/multipledispatch/dispatcher.py:278, in Dispatcher.__call__(self, *args, **kwargs)
276 self._cache[types] = func
277 try:
--> 278 return func(*args, **kwargs)
280 except MDNotImplementedError:
281 funcs = self.dispatch_iter(*types)
File ~/github.com/ibis-project/ibis/ibis/expr/schema.py:319, in schema_from_names_types(names, types)
317 for v in name_locs:
318 duplicate_names.remove(v)
--> 319 raise IntegrityError(f'Duplicate column name(s): {duplicate_names}')
321 # construct the schema
322 fields = dict(zip(names, types))
IntegrityError: Duplicate column name(s): ['seasonId_right', 'year_right', 'startTime_right', 'attendance_right', 'dayNight_right', 'duration_right', 'awayTeamId_right', 'awayTeamName_right', 'homeTeamId_right', 'homeTeamName_right', 'status_right']
Thanks for the note @gforsyth, appreciate it! I'll give that a shot
What happened?
Below example is just to reproduce an error I seem to get when using data that I can't share. Using python 3.10.4, Windows-10-10.0.19044-SP0
Error seems to be: “BadRequest: 400 Column name gameId is ambiguous at [71:8]”
Not sure if this is a bug, or if I'm just structuring the joins incorrectly.
What version of ibis are you using?
5.0.0
What backend(s) are you using, if any?
BigQuery
Relevant log output
Code of Conduct