pinecone-io / examples

Jupyter Notebooks to help you get hands-on with Pinecone vector databases
MIT License
2.72k stars 1.02k forks source link

[Bug] Unable to run example notebook: pubmed-bm25.ipynb #340

Open paulz opened 6 months ago

paulz commented 6 months ago

Is this a new bug?

Current Behavior

0%
 0/32 [00:00<?, ?it/s]
---------------------------------------------------------------------------
SparseValuesMissingKeysError              Traceback (most recent call last)
[<ipython-input-22-8f2be8886c89>](https://dtujx39ytn-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240502-060125_RC00_630016150#) in <cell line: 5>()
     35     # new_vectors = { 'sparse_values': {'indices': indices, 'values': values}}
     36     # index.upsert(vectors=new_vectors)
---> 37     index.upsert(vectors=vectors)
     38 
     39 # show index description after uploading the documents

6 frames
[/usr/local/lib/python3.10/dist-packages/pinecone/data/vector_factory.py](https://dtujx39ytn-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240502-060125_RC00_630016150#) in _dict_to_sparse_values(sparse_values_dict, check_type)
    104             raise SparseValuesDictionaryExpectedError(sparse_values_dict)
    105         if not {"indices", "values"}.issubset(sparse_values_dict):
--> 106             raise SparseValuesMissingKeysError(sparse_values_dict)
    107 
    108         indices = convert_to_list(sparse_values_dict.get("indices"))

SparseValuesMissingKeysError: Missing required keys in data in column `sparse_values`. Expected format is `'sparse_values': {'indices': List[int], 'values': List[float]}`. Found keys [16984, 3526, 2331, 1006, 7473, 2094, 1007, 2003, 1996, 12222, 1997, 4442, 2306, 2019, 15923, 1012, 12922, 3269, 9706, 17175, 18150, 2239, 11934, 27806, 7137, 2566, 29278, 10708, 1999, 2049, 3727, 2083, 8676, 1037, 17779, 6198, 20134, 1998, 18323, 9607, 4372, 20464, 18606, 2024, 29111, 5158, 2012, 2415, 2122, 22901, 15436, 2015, 1010, 7458, 3155, 2274, 2013, 12436, 28817,

Expected Behavior

example notebooks should work without error

Steps To Reproduce

  1. run https://github.com/pinecone-io/examples/blob/master/learn/search/hybrid-search/fast-intro/pubmed-bm25.ipynb in Colab
  2. go through steps until error

Relevant log output

0%
 0/32 [00:00<?, ?it/s]
---------------------------------------------------------------------------
SparseValuesMissingKeysError              Traceback (most recent call last)
<ipython-input-22-8f2be8886c89> in <cell line: 5>()
     35     # new_vectors = { 'sparse_values': {'indices': indices, 'values': values}}
     36     # index.upsert(vectors=new_vectors)
---> 37     index.upsert(vectors=vectors)
     38 
     39 # show index description after uploading the documents

6 frames
/usr/local/lib/python3.10/dist-packages/pinecone/data/vector_factory.py in _dict_to_sparse_values(sparse_values_dict, check_type)
    104             raise SparseValuesDictionaryExpectedError(sparse_values_dict)
    105         if not {"indices", "values"}.issubset(sparse_values_dict):
--> 106             raise SparseValuesMissingKeysError(sparse_values_dict)
    107 
    108         indices = convert_to_list(sparse_values_dict.get("indices"))

SparseValuesMissingKeysError: Missing required keys in data in column `sparse_values`. Expected format is `'sparse_values': {'indices': List[int], 'values': List[float]}`. Found keys [16984, 3526, 2331, 1006, 7473, 2094, 1007, 2003, 1996, 12222, 1997, 4442, 2306, 2019, 15923, 1012, 12922, 3269, 9706, 17175, 18150, 2239, 11934, 27806, 7137, 2566, 29278, 10708, 1999, 2049, 3727, 2083, 8676, 1037, 17779, 6198, 20134, 

### Environment

```markdown
- **OS**: Google Colab
- **Language version**: Python
- **Pinecone client version**: default

Additional Context

No response

paulz commented 5 months ago

tried again:

  0%|                                                                                                                                   | 0/32 [00:01<?, ?it/s]
---------------------------------------------------------------------------
SparseValuesMissingKeysError              Traceback (most recent call last)
Cell In[16], line 30
     22         vectors.append({
     23             'id': _id,
     24             'sparse_values': sparse,
     25             'values': dense,
     26             'metadata': metadata
     27         })
     29     # upload the documents to the new hybrid index
---> 30     index.upsert(vectors=vectors)
     32 # show index description after uploading the documents
     33 index.describe_index_stats()

File ~/workspace/third-party/pinecone/examples/venv/lib/python3.11/site-packages/pinecone/utils/error_handling.py:10, in validate_and_convert_errors.<locals>.inner_func(*args, **kwargs)
      7 @wraps(func)
      8 def inner_func(*args, **kwargs):
      9     try:
---> 10         return func(*args, **kwargs)
     11     except MaxRetryError as e:
     12         if isinstance(e.reason, ProtocolError):

File ~/workspace/third-party/pinecone/examples/venv/lib/python3.11/site-packages/pinecone/data/index.py:171, in Index.upsert(self, vectors, namespace, batch_size, show_progress, **kwargs)
    164     raise ValueError(
    165         "async_req is not supported when batch_size is provided."
    166         "To upsert in parallel, please follow: "
    167         "https://docs.pinecone.io/docs/insert-data#sending-upserts-in-parallel"
    168     )
    170 if batch_size is None:
--> 171     return self._upsert_batch(vectors, namespace, _check_type, **kwargs)
    173 if not isinstance(batch_size, int) or batch_size <= 0:
    174     raise ValueError("batch_size must be a positive integer")