googleapis / python-bigquery

Apache License 2.0
743 stars 302 forks source link

Inability to Use Google Cloud BigQuery's `AccessEntry` Objects as Hashable Elements #1620

Closed Mallington closed 11 months ago

Mallington commented 1 year ago

Description

I encountered an issue while attempting to use Google Cloud BigQuery's AccessEntry objects as hashable elements in sets or as keys in dictionaries. The problem arises due to the presence of nested dictionaries in the _key() method of the bigquery.AccessEntry class, which makes the objects unhashable.

Expected Behavior

I expect to be able to use bigquery.AccessEntry objects as hashable elements in sets and as keys in dictionaries. This would enable more efficient and organized handling of access permissions for datasets within Google BigQuery and facilitate the usage of these objects in various Python data structures.

Current Behavior

Currently, when attempting to use bigquery.AccessEntry objects in sets or dictionaries, the following error is encountered:

File "/libpath/venv/lib/python3.10/site-packages/google/cloud/bigquery/dataset.py", line 468, in hash
return hash(self._key())
TypeError: unhashable type: 'dict'

Steps to Reproduce

Here's a code snippet that reproduces the issue:

from google.cloud import bigquery

access_entries = [
    bigquery.AccessEntry("READER", "userByEmail", "user1@example.com"),
    bigquery.AccessEntry("WRITER", "userByEmail", "user2@example.com"),
    bigquery.AccessEntry(None, "view", {"projectId": "my-project", "datasetId": "my-dataset", "tableId": "my-table"}),
]

my_set = set(access_entries)  # This line raises a TypeError: unhashable type: 'dict'

Proposed Solution

To resolve this limitation, Google Cloud BigQuery's AccessEntry class needs to implement a proper hash method that considers the hashability of its all the fields, including dictionaires . Consider reimplementing the hash method to implement frozensets when hashing dictionaries:

>>> dict_example = {'a': 1, 'b': 2}
>>> hash(dict_example)
Traceback (most recent call last):
  File "/libpath/.pyenv/versions/3.10.8/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>> frozen_set = frozenset(dict_example)
>>> hash(frozen_set)
-7967343229136678437

Additional Information:

Python Version: Python 3.10
Google Cloud Python Client Library Version: google-cloud-bigquery==3.11.4

Impact:

This issue affects users who wish to leverage Python's built-in set operations and any data structures that rely on the hashing function.

Thank you for your help in advance :)

chalmerlowe commented 1 year ago

Thank you submitting this feature request. If you have code OR a patch that you would like to submit to implement this new functionality, feel free to issue a PR.

In the meantime, I will consider this feature request and explore where it should fit in our priority list based on current staffing and workloads.

Mallington commented 1 year ago

Thank you for your quick response. I created a monkey patch to replace the __hash__ implementation, however it is really inefficient and slow, because it is called multiple times when doing set comparisons. Hoping this will be useful for someone if they have the same issue though.

def __custom_hash(obj):
  from builtins import dict
  import json
  if isinstance(obj, dict):
    return hash(json.dumps(obj, sort_keys=True))
  elif isinstance(obj, tuple):
    return hash(tuple(__custom_hash(item) for item in obj))
  else:
    return hash(obj)
bigquery.AccessEntry.__hash__ = lambda self: __custom_hash(self._key())
chalmerlowe commented 11 months ago

I am gonna close this issue. With a potentially applicable workaround listed here for others who might encounter a similar issue and when considering the other items on our priority list, I do not see us fixing this anytime soon.