What are your takes on open-source model capability to implement numerous pipeline logic that you have came out with?

timothylimyl commented 1 year ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

Hi,

I really enjoyed reading through the entire llama index documentation to get some ideas on how you are going about implementing RAG.

It seems to me that a lot of software logic implemented here will require the LLM to be:

General-purpose (Instruction/Conversational)
Highly Performant (Generalisable, assume to be OS level LLM)

By OS level LLM, I am indicating that the LLM is treated as the central-agent that does is capable of dynamic planning and reasoning which I believe is only currently available for closed-sourced LLM as it is an emergent property of size. Thus, I am curious on whether have you tested the overall pipeline with an open-source LLM to replicate your RAG pipeline successfully.

For example, there's a few requirement that you have assumed the LLM to have: a. Logical decomposition of main query to precise sub query for usage of search and retrieval b. Evaluation of quality of answer given query to decide whether to re-query c. Using it as Main-Agent/Document-Agent including your idea of Router

A few that I think is within the realm of "doable" for open-source: a. Summarisation of descriptions of chunk (Efforts to disambiguate) b. Normal QA (Answers is found in context or in the compiled answers)

and an extra questionable feature: a. Answering based off context provided and not hallucinate. If you are just prompting it to "do not provide an answer if answers cannot be found in the context provided", then it is highly probable your LLM will still answer especially if context has some sort of relevance. This is something that GPT3.5 has problem with too.

I foresee a lot of pitfalls/point of failures that will come with open-source models especially if most pipeline logic is just prompting your way through and hoping that the LLM can handle the logic. Don't get me wrong, I understand the quick progress wins that you can get with OpenAI and it really helps you to test your idea immediately.

What are your takes on open-source model capability to implement numerous pipeline logic that you have came out with? Hope this thread can be a space for discussion in the area of open-source LLM.

dosubot[bot] commented 1 year ago

🤖

Hi,

Thank you for your detailed question and for taking the time to read through the LlamaIndex documentation. Your insights and questions are very valuable.

You're correct in your understanding that the LlamaIndex pipeline logic assumes certain capabilities from the Large Language Models (LLMs). These capabilities include logical decomposition of queries, evaluation of answer quality, and functioning as a Main-Agent/Document-Agent, among others.

As for your question about the feasibility of implementing these capabilities with open-source LLMs, it's a complex issue. Open-source LLMs have shown potential in implementing the pipeline logic outlined in the LlamaIndex. However, their performance can be variable and sometimes unreliable, requiring prompt engineering or fine-tuning to improve.

For instance, the open-source LLM 'llama2-chat-7b 4bit' from Hugging Face performs well with Basic Query Engines but is usually unreliable with Router Query Engine, SubQuestion Query Engine, Text2SQL, and Pydantic Programs. It is also sometimes unreliable with Data Agents. The model seems to be quite chatty, which makes parsing structured outputs difficult. Fine-tuning and prompt engineering are likely required for better performance on structured outputs.

Another open-source LLM, 'Mistral-7B-instruct-v0.1 4bit', also from Hugging Face, shows similar performance. It is reliable with Basic Query Engines but usually unreliable with Router Query Engine and SubQuestion Query Engine. It is sometimes unreliable with Text2SQL, Pydantic Programs, and Data Agents. This model seems slightly more reliable for structured outputs compared to Llama2, and with some prompt engineering, it may perform better.

The 'zephyr-7b-alpha' model from Hugging Face appears to be more reliable than other open-source models of this size. It performs well with Basic Query Engines, Router Query Engine, SubQuestion Query Engine, and Text2SQL. It is also reliable with Pydantic Programs but sometimes unreliable as a Data Agent. This model still hallucinates a bit, especially as an agent.

In conclusion, while open-source LLMs can be used to implement the pipeline logic in LlamaIndex, their performance can be variable and may require additional engineering or fine-tuning to improve. They offer the advantage of customizability and transparency, which can be beneficial in certain applications.

As for the specific features of the LlamaIndex that are most dependent on the capabilities of the LLMs, these include the main interfaces for interacting with different LLMs (Anthropic, Anyscale, AzureOpenAI, Bedrock, Clarifai, Cohere, CustomLLM, EverlyAI, GradientBaseModelLLM, GradientModelAdapterLLM, HuggingFaceInferenceAPI, HuggingFaceLLM, Konko, LangChainLLM, LiteLLM, LlamaCPP, LocalAI, MockLLM, MonsterLLM, Ollama, OpenAI, OpenAILike, PaLM, Portkey, PredibaseLLM, Replicate, Xinference), classes responsible for managing chat interactions and completions (ChatMessage, ChatResponse, ChatResponseAsyncGen, ChatResponseGen, CompletionResponse, CompletionResponseAsyncGen, CompletionResponseGen), and classes managing the metadata of LLMs and the roles of messages in a conversation (LLMMetadata, MessageRole).

I hope this answers your question. If you have any further queries, feel free to ask.

Best, Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Disiok commented 1 year ago

Hey @timothylimyl thanks for the thoughtful note!

I think your intuition / understanding is spot on. The more advanced features in llamaindex definitely assumes a level of LLM capability which many open source LLMs lack right now.

We've started more explicitly tracking which LLMs can power which features in llamaindex here: https://docs.llamaindex.ai/en/stable/core_modules/model_modules/llms/root.html#llm

run-llama / llama_index