settur1409 commented 8 months ago

Current Behavior

Getting qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 400 (Bad Request) when performing retriever.retrieve(query)

Steps to Reproduce

using FastEmbedEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2") from from llama_index.embeddings.fastembed (https://qdrant.github.io/fastembed/examples/Supported_Models/) Settings.embed_model = embed_model

self.client = QdrantClient(url=url, api_key=qdrant_api_key)

self.client.recreate_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

self.vector_store = QdrantVectorStore(client=self.client, collection_name=collection_name, )
self.index = VectorStoreIndex.from_vector_store(vector_store=self.vector_store)

ch_engine = index.as_chat_engine(llm=llm, chat_mode='openai') retriever = index.as_retriever()
retriever.retrieve(query) --> sending query as str.

I am getting below error, qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 400 (Bad Request) Raw response content: b'{"status":{"error":"Wrong input: Vector inserting error: expected dim: 384, got 1536"},"time":0.00039087}'

the embedding model supports only 384 dim, not sure from where 1536 is coming into picture.

Expected Behavior

Embedding seems not applied to query and I see there is no way to input embedding algo while creating index

Possible Solution

Context (Environment)

Detailed Description

Possible Implementation

agourlay commented 8 months ago

the embedding model supports only 384 dim, not sure from where 1536 is coming into picture.

I guess it comes from

ch_engine = index.as_chat_engine(llm=llm, chat_mode='openai')

openai uses 1536 dimensions embeddings.

settur1409 commented 8 months ago

Hey @agourlay, I didn't used openAI embeddings. I tried other alternative,

q_engine = index.as_query_engine() q_engine.query(query)

this also giving same error. I don't have any llm here. let me know your inputs.

agourlay commented 8 months ago

I believe this is a fastembed specific issue so I took the liberty to transfer it directly on the repository.

agourlay commented 8 months ago

dear @NirantK, any idea what it is going on there?

NirantK commented 8 months ago

@agourlay @settur1409 will reproduce and get back on findings — in the meantime @Anush008 maintains the Llama Index bindings for all Qdrant packages including FastEmbed

Anush008 commented 8 months ago

@settur1409, could you share your entire code. Would need that to reproduce.

Maybe as a Gist or file?

settur1409 commented 8 months ago

qdrant_main_file.txt --> I compiled my code into single file. Please check. Below is the log that I got,

Fetching 7 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<?, ?it/s] 8 Parsing nodes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 131/131 [00:00<00:00, 131.09it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 177/177 [01:40<00:00, 1.76it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 177/177 [01:40<00:00, 1.76it/s] Generating embeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 177/177 [01:03<00:00, 2.79it/s] 9

this is the error I got. qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 400 (Bad Request) Raw response content: b'{"status":{"error":"Wrong input: Vector inserting error: expected dim: 384, got 1536"},"time":0.001277349}'

let me know if you need some other information.

settur1409 commented 8 months ago

from the UI,

Attached few snashots from qdrant GUI.

Anush008 commented 8 months ago

@settur1409, what version of LLamaIndex are you on? There have been quite a lot changes in the recent versions.

settur1409 commented 8 months ago

Below are the list of llama-index related packages I see in my env

llama-index 0.10.16 llama-index-agent-openai 0.1.5 llama-index-cli 0.1.7 llama-index-core 0.10.16.post1 llama-index-embeddings-fastembed 0.1.4 llama-index-embeddings-huggingface 0.1.4 llama-index-embeddings-openai 0.1.6 llama-index-indices-managed-llama-cloud 0.1.3 llama-index-legacy 0.9.48 llama-index-llms-anyscale 0.1.3 llama-index-llms-langchain 0.1.3 llama-index-llms-openai 0.1.7 llama-index-multi-modal-llms-openai 0.1.4 llama-index-program-openai 0.1.4 llama-index-question-gen-openai 0.1.3 llama-index-readers-file 0.1.8 llama-index-readers-llama-parse 0.1.3 llama-index-vector-stores-chroma 0.1.5 llama-index-vector-stores-qdrant 0.1.4 llama-parse 0.3.6 llamaindex-py-client 0.1.13

qdrant-client version: qdrant-client 1.8.0

Anush008 commented 8 months ago

@settur1409, you have to move the following lines to the top(below the imports).

from llama_index.core import Settings

embed_model = FastEmbedEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
Settings.embed_model = embed_model

settur1409 commented 8 months ago

Thank you @Anush008. That fixed the problem.

mediamint077 commented 4 months ago

Hello Guys @Anush008 . i am getting the same issue but i am using the nvidiaEmbedding what i did i created the embedding of pdf document using the bge-small model and then i tried to use that embedding in my rag but i am getting the same error Unexpected Response: 400 (Bad Request) Raw response content: b'{"status":{"error":"Wrong input: Vector dimension error: expected dim: 384, got 1024"},"time":0.00041412}

here is my code import os import time import logging from telegram import Update from telegram.ext import ApplicationBuilder, CommandHandler, MessageHandler, filters, ContextTypes from qdrant_client import QdrantClient from langchain_qdrant import Qdrant from dotenv import load_dotenv from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, ChatNVIDIA

from langchain_community.document_loaders import PyPDFDirectoryLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.chains.combine_documents import create_stuff_documents_chain from langchain_core.prompts import ChatPromptTemplate from langchain.chains import create_retrieval_chain

Setup logging

logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', level=logging.INFO) logger = logging.getLogger(name)

Load environment variables from .env file

load_dotenv()

Load and log the NVIDIA API key and Telegram bot token

nvidia_api_key = os.getenv('NVIDIA_API_KEY') telegram_token = os.getenv('TELEGRAM_BOT_TOKEN')

Log the values of the environment variables

logger.info(f"NVIDIA_API_KEY: {nvidia_api_key}") logger.info(f"TELEGRAM_BOT_TOKEN: {telegram_token}")

Set the NVIDIA API key in the environment

os.environ['NVIDIA_API_KEY'] = nvidia_api_key

llm = ChatNVIDIA(model="meta/llama3-70b-instruct") # Nvidia Inference

Initialize embeddings and load documents

embeddings = NVIDIAEmbeddings()

Initialize Qdrant client and create the vector store

url = "http://ec2-13-53-193-62.eu-north-1.compute.amazonaws.com:6333" client = QdrantClient(url=url, prefer_grpc=False) vectors = Qdrant(client=client, embeddings=embeddings, collection_name="grade_9")

Create the prompt template

prompt_template = ChatPromptTemplate.from_template( """ Answer the question based on the provided context only. Please provide the most accurate response based on the question.

{context} Question: {input} """ ) async def start_command(update: Update, context: ContextTypes.DEFAULT_TYPE): await context.bot.send_message(chat_id=update.effective_chat.id, text="Hi! Send me a question and I will try to answer it based on the provided documents.") async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE): user_question = update.message.text document_chain = create_stuff_documents_chain(llm, prompt_template) retriever = vectors.as_retriever() retriever = vectors.as_retriever() print("retriever output...",retriever) retrieval_chain = create_retrieval_chain(retriever, document_chain) start_time = time.process_time() response = retrieval_chain.invoke({'input': user_question}) response_time = time.process_time() - start_time answer = response['answer'] # Send response back to user await update.message.reply_text(f"Response Time: {response_time:.2f} seconds\nAnswer: {answer}") # Optionally, send the relevant document context # for i, doc in enumerate(response["context"]): # await update.message.reply_text(f"Document {i+1}:\n{doc.page_content}\n----------------------------") async def error(update: Update, context: ContextTypes.DEFAULT_TYPE): logger.error(f'Update {update} caused error {context.error}') if __name__ == '__main__': application = ApplicationBuilder().token(telegram_token).build() command_handlers = [ CommandHandler("start", start_command), ] message_handlers = [ MessageHandler(filters.TEXT, handle_message) ] for handler in command_handlers: application.add_handler(handler) for handler in message_handlers: application.add_handler(handler) application.add_error_handler(error) try: logger.info("Starting the bot...") application.run_polling() logger.info("Bot started successfully!") except Exception as e: logger.error(f"Error occurred while starting the bot: {str(e)}")

Anush008 commented 4 months ago

Try deleting the "grade_9" collection. Langchain will auto-create it for you when you run methods like from_documets() and from_texts() with the appropriate dimensions.

qdrant / fastembed

qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 400 (Bad Request) #145

Current Behavior

Steps to Reproduce

Expected Behavior

Possible Solution

Context (Environment)

Detailed Description

Possible Implementation

from langchain_community.document_loaders import PyPDFDirectoryLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter

Setup logging

Load environment variables from .env file

Load and log the NVIDIA API key and Telegram bot token

Log the values of the environment variables

Set the NVIDIA API key in the environment

Initialize embeddings and load documents

Initialize Qdrant client and create the vector store

Create the prompt template