run-llama / create-llama

The easiest way to get started with LlamaIndex
MIT License
799 stars 96 forks source link

Pydantic Issue when running Ollama + FastAPI backend #244

Closed BastianSpatz closed 3 days ago

BastianSpatz commented 3 weeks ago

When using ollama as a model source, i get the error:

ERROR: Error when generating next question: 1 validation error for LLMStructuredPredictEndEvent output value is not a valid dict (type=type_error.dict)

when it wants to generate the NextQuestions.

https://github.com/run-llama/create-llama/blob/1d93775f043cd57852413a6cb7ee6f5302fc0093/templates/types/streaming/fastapi/app/api/services/suggestion.py#L50-L55

I think this is a llama-index/pydantic problem when calling astructured_predict in the call dispatcher.event(LLMStructuredPredictEndEvent(output=result)).

Has anybody anybody seen or fixed this error?

marcusschiesser commented 3 weeks ago
  1. The astructured_predict call requires good function calling. What model are you using?
  2. The code generated by create llama is only trying this call and shouldn't show the next questions if it fails - is this your behavior?
BastianSpatz commented 3 weeks ago

Thanks for the reply.

  1. I'm using llama 3.1 8B
  2. ANd yes it just throws the error and doesnt generate questions. But the app works fine nonetheless
marcusschiesser commented 3 weeks ago

@BastianSpatz I guess that the model is not capable enough to use structured_predict.

As Typescript doesn't have structured_predict, it's using a simple LLM call that is parsed; see: https://github.com/run-llama/create-llama/blob/8ce4a8513d9889427f80a7b5d1ce7bc04f35932f/templates/components/llamaindex/typescript/streaming/suggestion.ts#L16-L39

Can you try using the NextJS template first with your Ollama model - if that works you could modify suggestion.py accordingly.

BastianSpatz commented 3 weeks ago

Thank you for the help Ill check it out :)

marcusschiesser commented 3 weeks ago

Great. can you let me know the result, we can keep the ticket open till then

BastianSpatz commented 3 weeks ago

Using the same approach as in the Typescript version it works

marcusschiesser commented 3 weeks ago

cool. can you send a PR or post here your changes?

BastianSpatz commented 2 weeks ago

Sorry here is what i changed in the suggestions.py:

NEXT_QUESTIONS_SUGGESTION_PROMPT = PromptTemplate(
    "You're a helpful assistant! Your task is to suggest the next question that user might ask. "
    "\nHere is the conversation history"
    "\n---------------------\n{conversation}\n---------------------"
    "Given the conversation history, please give me {number_of_questions} questions that you might ask next!"
    "Keep the answers relevant to the conversation history and its context."
    "Your answer should be wrapped in three sticks which follows the following format:"
    "\`\`\`"
    "<question 1>\n"
    "<question 2>\n\`\`\`"
)

class NextQuestionSuggestion:
    @staticmethod
    def suggest_next_questions(
        messages: List[Message],
        number_of_questions: int = N_QUESTION_TO_GENERATE,
    ) -> List[str]:
        """
        Suggest the next questions that user might ask based on the conversation history
        Return as empty list if there is an error
        """
        try:
            # Reduce the cost by only using the last two messages
            last_user_message = None
            last_assistant_message = None
            for message in reversed(messages):
                if message.role == "user":
                    last_user_message = f"User: {message.content}"
                elif message.role == "assistant":
                    last_assistant_message = f"Assistant: {message.content}"
                if last_user_message and last_assistant_message:
                    break
            conversation: str = f"{last_user_message}\n{last_assistant_message}"

            # output: NextQuestions = await Settings.llm.astructured_predict(
            #     NextQuestions,
            #     prompt=NEXT_QUESTIONS_SUGGESTION_PROMPT,
            #     conversation=conversation,
            #     number_of_questions=number_of_questions,
            # )
            prompt = (
                NEXT_QUESTIONS_SUGGESTION_PROMPT.get_template()
                .replace("{conversation}", conversation)
                .replace("{number_of_questions}", str(number_of_questions))
            )
            output = Settings.llm.complete(prompt)
            questions = extract_questions_from_text(output.text)

            return questions
        except Exception as e:
            logger.error(f"Error when generating next question: {e}")
            return []

def extract_questions_from_text(prompt: str) -> List[str]:
    # Regular expression to match content within triple backticks
    pattern = r"\`\`\`(.*?)\`\`\`"

    # Find the content inside the backticks
    match = re.search(pattern, prompt, re.DOTALL)

    if match:
        # Split the content by newlines and strip any leading/trailing whitespace
        questions = [
            line.strip() for line in match.group(1).splitlines() if line.strip()
        ]
        questions = [question for question in questions if "?" in question]
        return questions

    return []

I have noticed that after a few questions the format of the output questions by the llm seem to deteriorate.

marcusschiesser commented 2 weeks ago

Thanks @BastianSpatz