[Feature Request]: Allow IngestionPipeline to use a custom API_Base and Custom model.

Feature Description

There are wrapper API's that allow us to use openAI client using custom API_Base, see: https://openrouter.ai/models/mistralai/mistral-7b-instruct:nitro?tab=api

What I would like for the feature to have is to be able to use this feature for LLamaIndex's Ingestion Pipeline.

Reason

The reason is just so that the IngestionPipeline supports more models.

Value of Feature

I believe this is in line with LlamaIndex roadmap of inclusive of as many models as possible to the roster

Hello @antoineross! I'm Dosu, a bot here to assist you with any bugs, questions, or guidance on contributing while you're waiting for a human maintainer to join in. I'm eager to help out and ensure your experience with LlamaIndex is smooth. Let me take a moment to review your feature request about enhancing the IngestionPipeline with a custom API_Base and model compatibility. I'll get back to you with a detailed response shortly!

Just for more information in reproducing, this is how you can test it, currently it breaks because it maps the model to the contextsize which I believe can be set at a higher-level API such as in (OpenAI):


async def main():
    OPENROUTER_API_KEY = os.environ["OPENROUTER_API_KEY"] 
    llm_model="mistralai/mixtral-8x7b-instruct:nitro"

    llm = OpenAI(temperature=0.1, model=llm_model, api_base="https://openrouter.ai/api/v1", api_key=OPENROUTER_API_KEY, max_tokens=512)
    text_splitter = TokenTextSplitter(
        separator=" ", chunk_size=512, chunk_overlap=128
    )

    PINECONE_API_KEY = os.environ['SECRET_PINECONE_API_KEY']
    PINECONE_INDEX_NAME = os.environ['SECRET_PINECONE_INDEX']

    # Create a Pinecone client
    pc = Pinecone(api_key=PINECONE_API_KEY)
    pinecone_index = pc.Index(PINECONE_INDEX_NAME)
    print(f"Connected to Pinecone index: {PINECONE_INDEX_NAME}")

    # ---------------- Extractor Logic ---------------- #
    print("Generating extractors, using LLM model: ", llm.model)
    extractors = [
        TitleExtractor(nodes=5, llm=llm),
        QuestionsAnsweredExtractor(questions=3, llm=llm), # 
        EntityExtractor(prediction_threshold=0.5), # Entity extractor extracts lists of (persons, locations etc.). Uses BERT (free but uses your devices compute)
        SummaryExtractor(summaries=["prev", "self"], llm=llm), 
        KeywordExtractor(keywords=10, llm=llm),
        # CustomExtractor()
    ]

    transformations = [text_splitter] + extractors

    # ---------------- Directory Reading -> Processing Logic ---------------- #

    uber_docs = SimpleDirectoryReader( #input_files=["documents/Ethical_Hacking_RHartleyTeachingStudents.pdf"]
                                    input_dir="documents").load_data()
    pipeline = IngestionPipeline(transformations=transformations)
    workers = multiprocessing.cpu_count()

    uber_nodes = await pipeline.arun(documents=uber_docs,
                            num_workers=workers,
                            show_progress=True,)

    # ---------------- Generating Embeddings ---------------- #
    # First save the metadata of the nodes in a JSON file
    import json
    # Create an empty list to store all the metadata
    all_metadata = []

    # Append the metadata of each node to the list
    for node in uber_nodes:
        all_metadata.append(node.metadata)

    # Save all the metadata in a single JSON file
    with open("llamaindex-output/10k-vFinalv1.json", "w") as f:
        json.dump(all_metadata, f)

    embed_model = OpenAIEmbedding(model="text-embedding-3-large")
    for node in uber_nodes:
        node_embedding = await embed_model.aget_text_embedding_batch(
            node.get_content(metadata_mode="all")
        )
        node.embedding = node_embedding

    vector_store = PineconeVectorStore(pinecone_index=pinecone_index, namespace="llama-index")
    vector_store.add(uber_nodes)

if __name__ == "__main__":
    asyncio.run(main())

—— Error:

Traceback (most recent call last):
  File "/Users/antoineross/miniconda3/envs/llamaindex/lib/python3.11/asyncio/tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/Users/antoineross/miniconda3/envs/llamaindex/lib/python3.11/site-packages/tqdm/asyncio.py", line 76, in wrap_awaitable
    return i, await f
              ^^^^^^^
  File "/Users/antoineross/miniconda3/envs/llamaindex/lib/python3.11/site-packages/llama_index/core/async_utils.py", line 106, in worker
    return await job
           ^^^^^^^^^
  File "/Users/antoineross/miniconda3/envs/llamaindex/lib/python3.11/site-packages/llama_index/core/llms/llm.py", line 292, in apredict
    if self.metadata.is_chat_model:
       ^^^^^^^^^^^^^
  File "/Users/antoineross/miniconda3/envs/llamaindex/lib/python3.11/site-packages/llama_index/llms/openai/base.py", line 231, in metadata
    context_window=openai_modelname_to_contextsize(self._get_model_name()),
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/antoineross/miniconda3/envs/llamaindex/lib/python3.11/site-packages/llama_index/llms/openai/utils.py", line 198, in openai_modelname_to_contextsize
    raise ValueError(
ValueError: Unknown model 'mistralai/mixtral-8x7b-instruct:nitro'. Please provide a valid OpenAI model name in: gpt-4, gpt-4-32k, gpt-4-1106-preview, gpt-4-0125-preview, gpt-4-turbo-preview, gpt-4-vision-preview, gpt-4-0613, gpt-4-32k-0613, gpt-4-0314, gpt-4-32k-0314, gpt-3.5-turbo, gpt-3.5-turbo-16k, gpt-3.5-turbo-0125, gpt-3.5-turbo-1106, gpt-3.5-turbo-0613, gpt-3.5-turbo-16k-0613, gpt-3.5-turbo-0301, text-davinci-003, text-davinci-002, gpt-3.5-turbo-instruct, text-ada-001, text-babbage-001, text-curie-001, ada, babbage, curie, davinci, gpt-35-turbo-16k, gpt-35-turbo, gpt-35-turbo-1106, gpt-35-turbo-0613, gpt-35-turbo-16k-0613
Task exception was never retrieved
future: <Task finished name='Task-4' coro=<tqdm_asyncio.gather.<locals>.wrap_awaitable() done, defined at /Users/antoineross/miniconda3/envs/llamaindex/lib/python3.11/site-packages/tqdm/asyncio.py:75> exception=ValueError("Unknown model 'mistralai/mixtral-8x7b-instruct:nitro'. Please provide a valid OpenAI model name in: gpt-4, gpt-4-32k, gpt-4-1106-preview, gpt-4-0125-preview, gpt-4-turbo-preview, gpt-4-vision-preview, gpt-4-0613, gpt-4-32k-0613, gpt-4-0314, gpt-4-32k-0314, gpt-3.5-turbo, gpt-3.5-turbo-16k, gpt-3.5-turbo-0125, gpt-3.5-turbo-1106, gpt-3.5-turbo-0613, gpt-3.5-turbo-16k-0613, gpt-3.5-turbo-0301, text-davinci-003, text-davinci-002, gpt-3.5-turbo-instruct, text-ada-001, text-babbage-001, text-curie-001, ada, babbage, curie, davinci, gpt-35-turbo-16k, gpt-35-turbo, gpt-35-turbo-1106, gpt-35-turbo-0613, gpt-35-turbo-16k-0613")>
Traceback (most recent call last):

The error you're encountering is due to the fact that the LlamaIndex currently does not recognize the 'mistralai/mixtral-8x7b-instruct:nitro' model. This is because the LlamaIndex has a predefined list of models it supports, and this model is not included in that list.

However, you can modify the LlamaIndex to recognize and use this model by creating a custom transformation. Here's a step-by-step guide on how to do this:

Define a Custom Transformation Class: This class should encapsulate the logic to interact with the custom API that provides access to the mistralai/mixtral-8x7b-instruct:nitro model. This involves making HTTP requests to the API endpoint and handling the response.

import requests

class MistralAIEmbedding(TransformComponent):
    def __init__(self, api_base: str, api_key: str):
        self.api_base = api_base
        self.api_key = api_key

    def transform(self, text: str) -> str:
        response = requests.post(
            f"{self.api_base}/transform",
            json={"model": "mistralai/mixtral-8x7b-instruct:nitro", "text": text},
            headers={"Authorization": f"Bearer {self.api_key}"}
        )
        response.raise_for_status()
        return response.json()['transformed_text']

Integrate the Custom Transformation into the Pipeline: Once you have the custom transformation class, you need to integrate it into the pipeline creation process. This involves modifying the default_transformations function or directly adding the custom transformation when constructing the PipelineCreate object.

def default_transformations(api_base: str, api_key: str) -> List[TransformComponent]:
    """Default transformations with custom MistralAI embedding."""
    from llama_index.embeddings.openai import OpenAIEmbedding  # pants: no-infer-dep

    return [
        SentenceSplitter(),
        OpenAIEmbedding(),
        MistralAIEmbedding(api_base=api_base, api_key=api_key),  # Add the custom transformation
    ]

When creating the pipeline, ensure you pass the appropriate api_base and api_key for the custom API:

api_base = "https://custom-api.example.com"
api_key = "your_api_key_here"

pipeline_create = get_pipeline_create(
    pipeline_name="YourPipelineName",
    client=get_client(),
    pipeline_type=PipelineType.YOUR_CHOICE,
    transformations=default_transformations(api_base, api_key),
    # other parameters as needed
)

This example demonstrates how to extend the ingestion pipeline to use a custom model by defining a custom transformation and integrating it into the pipeline creation process. Adjust the implementation details based on the actual API specifications and how the mistralai/mixtral-8x7b-instruct:nitro model is exposed through the custom API.

For more information, you can refer to the LlamaIndex's handling of custom models in its current implementation, which is done by defining a CustomLLM class that extends the LLM base class. This class serves as an abstract base class for custom Language Learning Models (LLMs) and requires subclasses to implement specific methods. You can find more details in the llama-index-core/llama_index/core/llms/custom.py file.

I hope this helps! If you have any other questions or run into any issues, feel free to ask.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

You can use the OpenAILike llm class if your API is compatible with the openai client

pip install llama-index-llms-openai-like

from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(model="my model", api_key="fake", api_base="....")

run-llama / llama_index