run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.86k stars 5.09k forks source link

[Question]: QA with PDF using Dense X Retrieval #12871

Closed LikhithRishi closed 2 months ago

LikhithRishi commented 5 months ago

Question Validation

Question

In Dense X Retrieval, I have modified base.py code from openAI to Gemini model and embeddings. It is working for pdf having lesser content only but not working for pdf having large content getiing Internal Server Error find the attachment Screenshot 2024-04-16 123007

dosubot[bot] commented 5 months ago

The "TypeError: Failed to fetch" error you're encountering when working with large PDF content and the Gemini model suggests a few specific areas to focus on for troubleshooting:

  1. Network and API Endpoint: Ensure your network connection is stable and the API endpoint is correctly configured. This error often points to network issues or incorrect endpoint configurations.

  2. Content Size Handling: Large PDFs may exceed the Gemini model's input token limit. Consider splitting the PDF content into smaller segments that fit within the model's limitations.

  3. Timeout Settings: The error might be due to timeouts from large payloads or slow responses. Adjust your network request timeout settings to accommodate larger or more complex requests.

  4. Error Handling and Logging: Implement robust error handling around your API calls and add logging to capture the size of inputs and other relevant details. This can help pinpoint the cause of failures.

  5. API Rate Limits: Verify that your usage does not exceed the API's rate limits and quotas, as this can lead to failed requests.

  6. Dependencies and Documentation: Make sure all dependencies are up to date and consult the Gemini model's API documentation for any new guidelines on handling large inputs.

By focusing on these areas, you should be able to identify and resolve the issue with handling large PDF content in your application.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a đź‘Ť if this is helpful and đź‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
LikhithRishi commented 5 months ago

The "TypeError: Failed to fetch" error you're encountering when working with large PDF content and the Gemini model suggests a few specific areas to focus on for troubleshooting:

  1. Network and API Endpoint: Ensure your network connection is stable and the API endpoint is correctly configured. This error often points to network issues or incorrect endpoint configurations.
  2. Content Size Handling: Large PDFs may exceed the Gemini model's input token limit. Consider splitting the PDF content into smaller segments that fit within the model's limitations.
  3. Timeout Settings: The error might be due to timeouts from large payloads or slow responses. Adjust your network request timeout settings to accommodate larger or more complex requests.
  4. Error Handling and Logging: Implement robust error handling around your API calls and add logging to capture the size of inputs and other relevant details. This can help pinpoint the cause of failures.
  5. API Rate Limits: Verify that your usage does not exceed the API's rate limits and quotas, as this can lead to failed requests.
  6. Dependencies and Documentation: Make sure all dependencies are up to date and consult the Gemini model's API documentation for any new guidelines on handling large inputs.

By focusing on these areas, you should be able to identify and resolve the issue with handling large PDF content in your application.

Sources

About Dosu

`import asyncio import json from typing import Any, Dict, List, Optional

import yaml from llama_index.core import Document, ServiceContext, VectorStoreIndex from llama_index.core.async_utils import run_jobs from llama_index.core.base.embeddings.base import BaseEmbedding from llama_index.core.base.response.schema import RESPONSE_TYPE from llama_index.core.llama_pack.base import BaseLlamaPack from llama_index.core.node_parser.interface import TextSplitter from llama_index.core.node_parser.text import SentenceSplitter from llama_index.core.prompts import PromptTemplate from llama_index.core.query_engine import RetrieverQueryEngine from llama_index.core.retrievers import RecursiveRetriever from llama_index.core.schema import IndexNode, TextNode from llama_index.embeddings.gemini import GeminiEmbedding from llama_index.llms.gemini import Gemini

backwards compatibility

try: from llama_index.core.llms.llm import LLM except ImportError: from llama_index.core.llms.base import LLM

PROPOSITIONS_PROMPT = PromptTemplate( """Decompose the "Content" into clear and simple propositions, ensuring they are interpretable out of context.

  1. Split compound sentence into simple sentences. Maintain the original phrasing from the input whenever possible.
  2. For any named entity that is accompanied by additional descriptive information, separate this information into its own distinct proposition.
  3. Decontextualize the proposition by adding necessary modifier to nouns or entire sentences and replacing pronouns (e.g., "it", "he", "she", "they", "this", "that") with the full name of the entities they refer to.
  4. Present the results as a list of strings, formatted in JSON.

Input: Title: ¯Eostre. Section: Theories and interpretations, Connection to Easter Hares. Content: The earliest evidence for the Easter Hare (Osterhase) was recorded in south-west Germany in 1678 by the professor of medicine Georg Franck von Franckenau, but it remained unknown in other parts of Germany until the 18th century. Scholar Richard Sermon writes that "hares were frequently seen in gardens in spring, and thus may have served as a convenient explanation for the origin of the colored eggs hidden there for children. Alternatively, there is a European tradition that hares laid eggs, since a hare’s scratch or form and a lapwing’s nest look very similar, and both occur on grassland and are first seen in the spring. In the nineteenth century the influence of Easter cards, toys, and books was to make the Easter Hare/Rabbit popular throughout Europe. German immigrants then exported the custom to Britain and America where it evolved into the Easter Bunny." Output: [ "The earliest evidence for the Easter Hare was recorded in south-west Germany in 1678 by Georg Franck von Franckenau.", "Georg Franck von Franckenau was a professor of medicine.", "The evidence for the Easter Hare remained unknown in other parts of Germany until the 18th century.", "Richard Sermon was a scholar.", "Richard Sermon writes a hypothesis about the possible explanation for the connection between hares and the tradition during Easter", "Hares were frequently seen in gardens in spring.", "Hares may have served as a convenient explanation for the origin of the colored eggs hidden in gardens for children.", "There is a European tradition that hares laid eggs.", "A hare’s scratch or form and a lapwing’s nest look very similar.", "Both hares and lapwing’s nests occur on grassland and are first seen in the spring.", "In the nineteenth century the influence of Easter cards, toys, and books was to make the Easter Hare/Rabbit popular throughout Europe.", "German immigrants exported the custom of the Easter Hare/Rabbit to Britain and America.", "The custom of the Easter Hare/Rabbit evolved into the Easter Bunny in Britain and America." ]

Input: {node_text} Output:""" )

class DenseXRetrievalPack(BaseLlamaPack): def init( self, documents: List[Document], proposition_llm: Optional[LLM] = None, query_llm: Optional[LLM] = None, embed_model: Optional[BaseEmbedding] = None, text_splitter: TextSplitter = SentenceSplitter(), similarity_top_k: int = 4, streaming: bool = False, ) -> None: """Init params.""" self._proposition_llm = proposition_llm or Gemini( model="models/gemini-1.5-pro-preview-0215", temperature=0.1, max_tokens=2048, )

    embed_model = embed_model or GeminiEmbedding(embed_batch_size=128)

    nodes = text_splitter.get_nodes_from_documents(documents)
    sub_nodes = self._gen_propositions(nodes)

    all_nodes = nodes + sub_nodes
    all_nodes_dict = {n.node_id: n for n in all_nodes}

    service_context = ServiceContext.from_defaults(
        llm=query_llm or Gemini(),
        embed_model=embed_model,
        num_output=self._proposition_llm.metadata.num_output,
    )

    self.vector_index = VectorStoreIndex(
        all_nodes, service_context=service_context, show_progress=True
    )

    self.retriever = RecursiveRetriever(
        "vector",
        retriever_dict={
            "vector": self.vector_index.as_retriever(
                similarity_top_k=similarity_top_k
            )
        },
        node_dict=all_nodes_dict,
    )

    self.query_engine = RetrieverQueryEngine.from_args(
        self.retriever,
        service_context=service_context,
        streaming=streaming,
    )

async def _aget_proposition(self, node: TextNode) -> List[TextNode]:
    """Get proposition."""
    inital_output = await self._proposition_llm.apredict(
        PROPOSITIONS_PROMPT, node_text=node.text
    )
    outputs = inital_output.split("\n")

    all_propositions = []

    for output in outputs:
        if not output.strip():
            continue
        if not output.strip().endswith("]"):
            if not output.strip().endswith('"') and not output.strip().endswith(
                ","
            ):
                output = output + '"'
            output = output + " ]"
        if not output.strip().startswith("["):
            if not output.strip().startswith('"'):
                output = '"' + output
            output = "[ " + output

        try:
            propositions = json.loads(output)
        except Exception:
            # fallback to yaml
            try:
                propositions = yaml.safe_load(output)
            except Exception:
                # fallback to next output
                continue

        if not isinstance(propositions, list):
            continue

        all_propositions.extend(propositions)

    assert isinstance(all_propositions, list)
    nodes = [TextNode(text=prop) for prop in all_propositions if prop]

    return [IndexNode.from_text_node(n, node.node_id) for n in nodes]

def _gen_propositions(self, nodes: List[TextNode]) -> List[TextNode]:
    """Get propositions."""
    sub_nodes = asyncio.run(
        run_jobs(
            [self._aget_proposition(node) for node in nodes],
            show_progress=True,
            workers=8,
        )
    )

    # Flatten list
    return [node for sub_node in sub_nodes for node in sub_node]

def get_modules(self) -> Dict[str, Any]:
    """Get modules."""
    return {
        "query_engine": self.query_engine,
        "retriever": self.retriever,
    }

def run(self, query_str: str, **kwargs: Any) -> RESPONSE_TYPE:
    """Run the pipeline."""
    return self.query_engine.query(query_str)

`

this is the modified code of base.py using Gemini model after the changes are also done in the code when we executing Dense X as: `from llama_index.llms.gemini import Gemini from llama_index.core.node_parser import SentenceSplitter from google.generativeai.types import safety_types from base1 import DenseXRetrievalPack as Dp1

Create a Gemini model with stricter safety settings

model = Gemini( model="models/gemini-pro", max_tokens=2048, safety_settings = [ { "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }, { "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" } ] )

Use the new model in your DenseXRetrievalPack

dense_pack = Dp1( documents, proposition_llm=model,
query_llm=model, text_splitter=SentenceSplitter(chunk_size=1024), ) dense_query_engine = dense_pack.query_engine`

For some PDf's I am getting error as folows: **--------------------------------------------------------------------------- StopCandidateException Traceback (most recent call last)

in () 27 ) 28 # Use the new model in your DenseXRetrievalPack ---> 29 dense_pack = Dp1( 30 documents, 31 proposition_llm=model, 21 frames /usr/local/lib/python3.10/dist-packages/google/generativeai/generative_models.py in _check_response(self, response, stream) 459 glm.Candidate.FinishReason.MAX_TOKENS, 460 ): --> 461 raise generation_types.StopCandidateException(response.candidates[0]) 462 463 def _get_function_calls(self, response) -> list[glm.FunctionCall]: StopCandidateException: finish_reason: RECITATION index: 0** and for some pdfs getting 500: Internal Server Error
dosubot[bot] commented 5 months ago

The errors you're encountering with large PDFs in your Dense X Retrieval implementation using the Gemini model can be attributed to a few key areas:

  1. StopCandidateException with finish_reason: RECITATION: This indicates that the model's processing logic is prematurely stopping. Review and adjust the logic for stopping conditions to ensure they're suitable for the complexity and size of the PDFs you're working with.

  2. 500: Internal Server Error: This generic error suggests a server-side issue, possibly due to resource constraints like memory when processing large PDFs. Check server logs for more detailed errors and consider optimizing your code for better memory usage or increasing server resources.

  3. TypeError: Failed to fetch: Often related to network or API endpoint issues. Ensure your requests are correctly formatted and the server is reachable. Adjusting timeout settings might also help to accommodate the processing time needed for larger files.

For troubleshooting, focus on optimizing memory usage, increasing timeouts for network requests, enhancing error handling for more informative feedback, and ensuring your server configuration can handle the workload. Additionally, check the integration and processing logic of the Gemini model in your system, paying close attention to how PDFs are processed and ensuring conditions for exceptions like StopCandidateException are correctly set for your use case.

Regarding the Gemini model's specifics, it's designed to handle multi-modal inputs, and its performance can be influenced by how it's integrated into your system, especially in terms of memory management and processing logic for large PDFs. The model's max_tokens parameter plays a crucial role here, and ensuring it aligns with your PDFs' content size is essential for avoiding errors.

If you need further assistance or clarification on any of these points, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a đź‘Ť if this is helpful and đź‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
logan-markewich commented 5 months ago

I feel like this is more of an error with googles servers? I could be a bunch of things. I think this model is super new too, wouldn't be surprised if it had some issues