run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.76k stars 4.74k forks source link

[Bug]: Adding nodes to a keyword index generates quota exceeded error #7744

Closed axgabs closed 6 months ago

axgabs commented 10 months ago

Bug Description

Different from vector index, when I try to add several nodes to a keyword index, it returns me a quota exceeded error. I wonder if it is possible to add a exception treatment which will wait some time and try again to add the remaining nodes.

Version

llama-index-0.8.29.post1

Steps to Reproduce

Use index.insert_nodes(node) several times.

Relevant Logs/Tracbacks

/opt/conda/envs/whirlpalm/lib/python3.9/site-packages/llama_index/indices/base.py:181 in         │
│ insert_nodes                                                                                     │
│                                                                                                  │
│   178 │   │   """Insert nodes."""                                                                │
│   179 │   │   with self._service_context.callback_manager.as_trace("insert_nodes"):              │
│   180 │   │   │   self.docstore.add_documents(nodes, allow_update=True)                          │
│ ❱ 181 │   │   │   self._insert(nodes, **insert_kwargs)                                           │
│   182 │   │   │   self._storage_context.index_store.add_index_struct(self._index_struct)         │
│   183 │                                                                                          │
│   184 │   def insert(self, document: Document, **insert_kwargs: Any) -> None:                    │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/llama_index/indices/keyword_table/base.py: │
│ 173 in _insert                                                                                   │
│                                                                                                  │
│   170 │   def _insert(self, nodes: Sequence[BaseNode], **insert_kwargs: Any) -> None:            │
│   171 │   │   """Insert nodes."""                                                                │
│   172 │   │   for n in nodes:                                                                    │
│ ❱ 173 │   │   │   keywords = self._extract_keywords(                                             │
│   174 │   │   │   │   n.get_content(metadata_mode=MetadataMode.LLM)                              │
│   175 │   │   │   )                                                                              │
│   176 │   │   │   self._index_struct.add_node(list(keywords), n)                                 │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/llama_index/indices/keyword_table/base.py: │
│ 222 in _extract_keywords                                                                         │
│                                                                                                  │
│   219 │                                                                                          │
│   220 │   def _extract_keywords(self, text: str) -> Set[str]:                                    │
│   221 │   │   """Extract keywords from text."""                                                  │
│ ❱ 222 │   │   response = self._service_context.llm_predictor.predict(                            │
│   223 │   │   │   self.keyword_extract_template,                                                 │
│   224 │   │   │   text=text,                                                                     │
│   225 │   │   )                                                                                  │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/llama_index/llm_predictor/base.py:142 in   │
│ predict                                                                                          │
│                                                                                                  │
│   139 │   │   if self._llm.metadata.is_chat_model:                                               │
│   140 │   │   │   messages = prompt.format_messages(llm=self._llm, **prompt_args)                │
│   141 │   │   │   messages = self._extend_messages(messages)                                     │
│ ❱ 142 │   │   │   chat_response = self._llm.chat(messages)                                       │
│   143 │   │   │   output = chat_response.message.content or ""                                   │
│   144 │   │   │   # NOTE: this is an approximation, only for token counting                      │
│   145 │   │   │   formatted_prompt = messages_to_prompt(messages)                                │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/llama_index/llms/base.py:151 in            │
│ wrapped_llm_chat                                                                                 │
│                                                                                                  │
│   148 │   │   │   │   │   │   EventPayload.SERIALIZED: _self.to_dict(),                          │
│   149 │   │   │   │   │   },                                                                     │
│   150 │   │   │   │   )                                                                          │
│ ❱ 151 │   │   │   │   f_return_val = f(_self, messages, **kwargs)                                │
│   152 │   │   │   │                                                                              │
│   153 │   │   │   │   if isinstance(f_return_val, Generator):                                    │
│   154 │   │   │   │   │   # intercept the generator and add a callback to the end                │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/llama_index/llms/langchain.py:57 in chat   │
│                                                                                                  │
│    54 │   @llm_chat_callback()                                                                   │
│    55 │   def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:        │
│    56 │   │   lc_messages = to_lc_messages(messages)                                             │
│ ❱  57 │   │   lc_message = self._llm.predict_messages(messages=lc_messages, **kwargs)            │
│    58 │   │   message = from_lc_messages([lc_message])[0]                                        │
│    59 │   │   return ChatResponse(message=message)                                               │
│    60                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/langchain/chat_models/base.py:601 in       │
│ predict_messages                                                                                 │
│                                                                                                  │
│   598 │   │   │   _stop = None                                                                   │
│   599 │   │   else:                                                                              │
│   600 │   │   │   _stop = list(stop)                                                             │
│ ❱ 601 │   │   return self(messages, stop=_stop, **kwargs)                                        │
│   602 │                                                                                          │
│   603 │   async def apredict(                                                                    │
│   604 │   │   self, text: str, *, stop: Optional[Sequence[str]] = None, **kwargs: Any            │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/langchain/chat_models/base.py:551 in       │
│ __call__                                                                                         │
│                                                                                                  │
│   548 │   │   callbacks: Callbacks = None,                                                       │
│   549 │   │   **kwargs: Any,                                                                     │
│   550 │   ) -> BaseMessage:                                                                      │
│ ❱ 551 │   │   generation = self.generate(                                                        │
│   552 │   │   │   [messages], stop=stop, callbacks=callbacks, **kwargs                           │
│   553 │   │   ).generations[0][0]                                                                │
│   554 │   │   if isinstance(generation, ChatGeneration):                                         │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/langchain/chat_models/base.py:309 in       │
│ generate                                                                                         │
│                                                                                                  │
│   306 │   │   │   except BaseException as e:                                                     │
│   307 │   │   │   │   if run_managers:                                                           │
│   308 │   │   │   │   │   run_managers[i].on_llm_error(e)                                        │
│ ❱ 309 │   │   │   │   raise e                                                                    │
│   310 │   │   flattened_outputs = [                                                              │
│   311 │   │   │   LLMResult(generations=[res.generations], llm_output=res.llm_output)            │
│   312 │   │   │   for res in results                                                             │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/langchain/chat_models/base.py:299 in       │
│ generate                                                                                         │
│                                                                                                  │
│   296 │   │   for i, m in enumerate(messages):                                                   │
│   297 │   │   │   try:                                                                           │
│   298 │   │   │   │   results.append(                                                            │
│ ❱ 299 │   │   │   │   │   self._generate_with_cache(                                             │
│   300 │   │   │   │   │   │   m,                                                                 │
│   301 │   │   │   │   │   │   stop=stop,                                                         │
│   302 │   │   │   │   │   │   run_manager=run_managers[i] if run_managers else None,             │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/langchain/chat_models/base.py:446 in       │
│ _generate_with_cache                                                                             │
│                                                                                                  │
│   443 │   │   │   │   │   "Asked to cache, but no cache found at `langchain.cache`."             │
│   444 │   │   │   │   )                                                                          │
│   445 │   │   │   if new_arg_supported:                                                          │
│ ❱ 446 │   │   │   │   return self._generate(                                                     │
│   447 │   │   │   │   │   messages, stop=stop, run_manager=run_manager, **kwargs                 │
│   448 │   │   │   │   )                                                                          │
│   449 │   │   │   else:                                                                          │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/langchain/chat_models/vertexai.py:158 in   │
│ _generate                                                                                        │
│                                                                                                  │
│   155 │   │   │   )                                                                              │
│   156 │   │   else:                                                                              │
│   157 │   │   │   chat = self.client.start_chat(message_history=history.history, **params)       │
│ ❱ 158 │   │   response = chat.send_message(question.content)                                     │
│   159 │   │   text = self._enforce_stop_words(response.text, stop)                               │
│   160 │   │   return ChatResult(generations=[ChatGeneration(message=AIMessage(content=text))])   │
│   161                                                                                            │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/vertexai/language_models/_language_models. │
│ py:855 in send_message                                                                           │
│                                                                                                  │
│    852 │   │   │   │   for example in self._examples                                             │
│    853 │   │   │   ]                                                                             │
│    854 │   │                                                                                     │
│ ❱  855 │   │   prediction_response = self._model._endpoint.predict(                              │
│    856 │   │   │   instances=[prediction_instance],                                              │
│    857 │   │   │   parameters=prediction_parameters,                                             │
│    858 │   │   )                                                                                 │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/google/cloud/aiplatform/models.py:1564 in  │
│ predict                                                                                          │
│                                                                                                  │
│   1561 │   │   │   │   ),                                                                        │
│   1562 │   │   │   )                                                                             │
│   1563 │   │   else:                                                                             │
│ ❱ 1564 │   │   │   prediction_response = self._prediction_client.predict(                        │
│   1565 │   │   │   │   endpoint=self._gca_resource.name,                                         │
│   1566 │   │   │   │   instances=instances,                                                      │
│   1567 │   │   │   │   parameters=parameters,                                                    │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/google/cloud/aiplatform_v1/services/predic │
│ tion_service/client.py:606 in predict                                                            │
│                                                                                                  │
│    603 │   │   )                                                                                 │
│    604 │   │                                                                                     │
│    605 │   │   # Send the request.                                                               │
│ ❱  606 │   │   response = rpc(                                                                   │
│    607 │   │   │   request,                                                                      │
│    608 │   │   │   retry=retry,                                                                  │
│    609 │   │   │   timeout=timeout,                                                              │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py:113 in  │
│ __call__                                                                                         │
│                                                                                                  │
│   110 │   │   │   metadata.extend(self._metadata)                                                │
│   111 │   │   │   kwargs["metadata"] = metadata                                                  │
│   112 │   │                                                                                      │
│ ❱ 113 │   │   return wrapped_func(*args, **kwargs)                                               │
│   114                                                                                            │
│   115                                                                                            │
│   116 def wrap_method(                                                                           │
│                                                                                                  │
│ /opt/conda/envs/whirlpalm/lib/python3.9/site-packages/google/api_core/grpc_helpers.py:74 in      │
│ error_remapped_callable                                                                          │
│                                                                                                  │
│    71 │   │   try:                                                                               │
│    72 │   │   │   return callable_(*args, **kwargs)                                              │
│    73 │   │   except grpc.RpcError as exc:                                                       │
│ ❱  74 │   │   │   raise exceptions.from_grpc_error(exc) from exc                                 │
│    75 │                                                                                          │
│    76 │   return error_remapped_callable                                                         │
│    77
dosubot[bot] commented 10 months ago

🤖

Hello,

Yes, it's possible to implement an exception treatment in LlamaIndex v0.8.29.post1 to handle a quota exceeded error when adding multiple nodes to a keyword index using the index.insert_nodes(node) function.

You can do this by adding a try/except block around the self._client.upsert call in the add method of the QdrantVectorStore class. If a quota exceeded error is raised, you can catch it and handle it appropriately.

Here is an example of how you can do this:

from qdrant_client.http.exceptions import QuotaExceededError

def add(self, nodes: List[BaseNode]) -> List[str]:
    """Add nodes to index.

    Args
        nodes: List[BaseNode]: list of nodes with embeddings

    """
    from qdrant_client.http import models as rest

    if len(nodes) > 0 and not self._collection_initialized:
        self._create_collection(
            collection_name=self.collection_name,
            vector_size=len(nodes[0].get_embedding()),
        )

    ids = []
    for node_batch in iter_batch(nodes, self.batch_size):
        node_ids = []
        vectors = []
        payloads = []
        for node in node_batch:
            assert isinstance(node, BaseNode)
            node_ids.append(node.node_id)
            vectors.append(node.get_embedding())

            metadata = node_to_metadata_dict(
                node, remove_text=False, flat_metadata=self.flat_metadata
            )

            payloads.append(metadata)
        try:
            self._client.upsert(
                collection_name=self.collection_name,
                points=rest.Batch.construct(
                    ids=node_ids,
                    vectors=vectors,
                    payloads=payloads,
                ),
            )
        except QuotaExceededError:
            # Handle quota exceeded error here
            logger.error("Quota exceeded when adding nodes to index")
            break
        ids.extend(node_ids)
    return ids

In this example, if a QuotaExceededError is raised during the self._client.upsert call, the error is caught and a message is logged. The loop then breaks, stopping the addition of more nodes to the index. You can replace the logging statement with your own error handling code.

Regarding the exact error message that is being returned when the quota exceeded error occurs, could you please provide more details? This would help in understanding the issue better and providing a more accurate solution.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] commented 7 months ago

Hi, @axgabs,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, you raised an issue regarding a quota exceeded error when adding nodes to a keyword index in llama-index-0.8.29.post1. It looks like I provided a detailed response suggesting the implementation of an exception treatment in LlamaIndex v0.8.29.post1 to handle the quota exceeded error, along with a code example demonstrating how to use a try/except block to catch the QuotaExceededError and handle it appropriately.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation. If you have any further questions or need assistance, feel free to reach out.