langchain-ai / langchain-google

MIT License
112 stars 138 forks source link

BigQueryVectorStore Needs permission to create temp dataset #498

Closed hutariq closed 1 month ago

hutariq commented 1 month ago

In the latest update the previous issue was resolved where it was checking for bigquery.datasets.create permission even if it is created already.

But now it also needs to create temp dataset

if not check_bq_dataset_exists(client=self._bq_client, dataset_id=temp_dataset_id):
       self._bq_client.create_dataset(dataset=temp_dataset_id, exists_ok=True)
Traceback (most recent call last):
  File "/var/folders/fx/72vr16sx4zj1bzn09c6p0ynh0000gn/T/ipykernel_41013/2982358708.py", line 4, in <module>
    bq = BigQueryVectorStore(
         ^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pydantic/main.py", line 209, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/langchain_google_community/bq_storage_vectorstores/_base.py", line 151, in validate_vals
    self._bq_client.create_dataset(dataset=temp_dataset_id, exists_ok=True)
  File ".venv/lib/python3.11/site-packages/google/cloud/bigquery/client.py", line 683, in create_dataset
    api_response = self._call_api(
                   ^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/google/cloud/bigquery/client.py", line 833, in _call_api
    return call()
           ^^^^^^
  File ".venv/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 293, in retry_wrapped_func
    return retry_target(
           ^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 153, in retry_target
    _retry_error_helper(
  File ".venv/lib/python3.11/site-packages/google/api_core/retry/retry_base.py", line 212, in _retry_error_helper
    raise final_exc from source_exc
  File "/.venv/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 144, in retry_target
    result = target()
             ^^^^^^^^
  File ".venv/lib/python3.11/site-packages/google/cloud/_http/__init__.py", line 494, in api_request
    raise exceptions.from_http_response(response)

So the issue persists if one doesn't have the permission to create a dataset.

Creating temp dataset should be optional

hutariq commented 1 month ago

@eliasecchig Can you please look into this. This is really hindering the work as the problem is at object initialization.

eliasecchig commented 1 month ago

@hutariq creating a temp dataset would be required for efficient batch search. Can you create it before using the class e.g via terraform?

The name would be "{your dataset id}_temp" Let me know!

hutariq commented 1 month ago

@eliasecchig I have restriction on my account. I do not have a permission to create a new dataset.

If temp dataset is required just to have the efficient batch search then it should be optional.

eliasecchig commented 1 month ago

Agreed, I think we can introduce a parameter for the temp dataset it. This way you can directly point to the same dataset for both, avoiding any creation. What do you think?