qdrant / fastembed

Fast, Accurate, Lightweight Python library to make State of the Art Embedding
https://qdrant.github.io/fastembed/
Apache License 2.0
1.51k stars 109 forks source link

qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 400 (Bad Request) #145

Closed settur1409 closed 8 months ago

settur1409 commented 8 months ago

Current Behavior

Getting qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 400 (Bad Request) when performing retriever.retrieve(query)

Steps to Reproduce

  1. using FastEmbedEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2") from from llama_index.embeddings.fastembed (https://qdrant.github.io/fastembed/examples/Supported_Models/) Settings.embed_model = embed_model

  2. self.client = QdrantClient(url=url, api_key=qdrant_api_key)

    self.client.recreate_collection(
        collection_name=collection_name,
        vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    )
    
    self.vector_store = QdrantVectorStore(client=self.client, collection_name=collection_name, )
    self.index = VectorStoreIndex.from_vector_store(vector_store=self.vector_store)
  3. ch_engine = index.as_chat_engine(llm=llm, chat_mode='openai') retriever = index.as_retriever()

  4. retriever.retrieve(query) --> sending query as str.

I am getting below error, qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 400 (Bad Request) Raw response content: b'{"status":{"error":"Wrong input: Vector inserting error: expected dim: 384, got 1536"},"time":0.00039087}'

the embedding model supports only 384 dim, not sure from where 1536 is coming into picture.

Expected Behavior

Embedding seems not applied to query and I see there is no way to input embedding algo while creating index

Possible Solution

Context (Environment)

Detailed Description

Possible Implementation

agourlay commented 8 months ago

the embedding model supports only 384 dim, not sure from where 1536 is coming into picture.

I guess it comes from

ch_engine = index.as_chat_engine(llm=llm, chat_mode='openai')

openai uses 1536 dimensions embeddings.

settur1409 commented 8 months ago

Hey @agourlay, I didn't used openAI embeddings. I tried other alternative,

q_engine = index.as_query_engine() q_engine.query(query)

this also giving same error. I don't have any llm here. let me know your inputs.

agourlay commented 8 months ago

I believe this is a fastembed specific issue so I took the liberty to transfer it directly on the repository.

agourlay commented 8 months ago

dear @NirantK, any idea what it is going on there?

NirantK commented 8 months ago

@agourlay @settur1409 will reproduce and get back on findings — in the meantime @Anush008 maintains the Llama Index bindings for all Qdrant packages including FastEmbed

Anush008 commented 8 months ago

@settur1409, could you share your entire code. Would need that to reproduce.

Maybe as a Gist or file?

settur1409 commented 8 months ago

qdrant_main_file.txt --> I compiled my code into single file. Please check. Below is the log that I got,

Fetching 7 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<?, ?it/s] 8 Parsing nodes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 131/131 [00:00<00:00, 131.09it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 177/177 [01:40<00:00, 1.76it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 177/177 [01:40<00:00, 1.76it/s] Generating embeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 177/177 [01:03<00:00, 2.79it/s] 9

this is the error I got. qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 400 (Bad Request) Raw response content: b'{"status":{"error":"Wrong input: Vector inserting error: expected dim: 384, got 1536"},"time":0.001277349}'

let me know if you need some other information.

settur1409 commented 8 months ago

from the UI, image

image Attached few snashots from qdrant GUI.

Anush008 commented 8 months ago

@settur1409, what version of LLamaIndex are you on? There have been quite a lot changes in the recent versions.

settur1409 commented 8 months ago

Below are the list of llama-index related packages I see in my env

llama-index 0.10.16 llama-index-agent-openai 0.1.5 llama-index-cli 0.1.7 llama-index-core 0.10.16.post1 llama-index-embeddings-fastembed 0.1.4 llama-index-embeddings-huggingface 0.1.4 llama-index-embeddings-openai 0.1.6 llama-index-indices-managed-llama-cloud 0.1.3 llama-index-legacy 0.9.48 llama-index-llms-anyscale 0.1.3 llama-index-llms-langchain 0.1.3 llama-index-llms-openai 0.1.7 llama-index-multi-modal-llms-openai 0.1.4 llama-index-program-openai 0.1.4 llama-index-question-gen-openai 0.1.3 llama-index-readers-file 0.1.8 llama-index-readers-llama-parse 0.1.3 llama-index-vector-stores-chroma 0.1.5 llama-index-vector-stores-qdrant 0.1.4 llama-parse 0.3.6 llamaindex-py-client 0.1.13

qdrant-client version: qdrant-client 1.8.0

Anush008 commented 8 months ago

@settur1409, you have to move the following lines to the top(below the imports).

from llama_index.core import Settings

embed_model = FastEmbedEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
Settings.embed_model = embed_model
settur1409 commented 8 months ago

Thank you @Anush008. That fixed the problem.

Nauman-arshad483 commented 4 months ago

Hello Guys @Anush008 . i am getting the same issue but i am using the nvidiaEmbedding what i did i created the embedding of pdf document using the bge-small model and then i tried to use that embedding in my rag but i am getting the same error Unexpected Response: 400 (Bad Request) Raw response content: b'{"status":{"error":"Wrong input: Vector dimension error: expected dim: 384, got 1024"},"time":0.00041412}

here is my code import os import time import logging from telegram import Update from telegram.ext import ApplicationBuilder, CommandHandler, MessageHandler, filters, ContextTypes from qdrant_client import QdrantClient from langchain_qdrant import Qdrant from dotenv import load_dotenv from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, ChatNVIDIA

from langchain_community.document_loaders import PyPDFDirectoryLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.chains.combine_documents import create_stuff_documents_chain from langchain_core.prompts import ChatPromptTemplate from langchain.chains import create_retrieval_chain

Setup logging

logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', level=logging.INFO) logger = logging.getLogger(name)

Load environment variables from .env file

load_dotenv()

Load and log the NVIDIA API key and Telegram bot token

nvidia_api_key = os.getenv('NVIDIA_API_KEY') telegram_token = os.getenv('TELEGRAM_BOT_TOKEN')

Log the values of the environment variables

logger.info(f"NVIDIA_API_KEY: {nvidia_api_key}") logger.info(f"TELEGRAM_BOT_TOKEN: {telegram_token}")

Set the NVIDIA API key in the environment

os.environ['NVIDIA_API_KEY'] = nvidia_api_key

llm = ChatNVIDIA(model="meta/llama3-70b-instruct") # Nvidia Inference

Initialize embeddings and load documents

embeddings = NVIDIAEmbeddings()

Initialize Qdrant client and create the vector store

url = "http://ec2-13-53-193-62.eu-north-1.compute.amazonaws.com:6333" client = QdrantClient(url=url, prefer_grpc=False) vectors = Qdrant(client=client, embeddings=embeddings, collection_name="grade_9")

Create the prompt template

prompt_template = ChatPromptTemplate.from_template( """ Answer the question based on the provided context only. Please provide the most accurate response based on the question.

{context} Question: {input} """ ) async def start_command(update: Update, context: ContextTypes.DEFAULT_TYPE): await context.bot.send_message(chat_id=update.effective_chat.id, text="Hi! Send me a question and I will try to answer it based on the provided documents.") async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE): user_question = update.message.text document_chain = create_stuff_documents_chain(llm, prompt_template) retriever = vectors.as_retriever() retriever = vectors.as_retriever() print("retriever output...",retriever) retrieval_chain = create_retrieval_chain(retriever, document_chain) start_time = time.process_time() response = retrieval_chain.invoke({'input': user_question}) response_time = time.process_time() - start_time answer = response['answer'] # Send response back to user await update.message.reply_text(f"Response Time: {response_time:.2f} seconds\nAnswer: {answer}") # Optionally, send the relevant document context # for i, doc in enumerate(response["context"]): # await update.message.reply_text(f"Document {i+1}:\n{doc.page_content}\n----------------------------") async def error(update: Update, context: ContextTypes.DEFAULT_TYPE): logger.error(f'Update {update} caused error {context.error}') if __name__ == '__main__': application = ApplicationBuilder().token(telegram_token).build() command_handlers = [ CommandHandler("start", start_command), ] message_handlers = [ MessageHandler(filters.TEXT, handle_message) ] for handler in command_handlers: application.add_handler(handler) for handler in message_handlers: application.add_handler(handler) application.add_error_handler(error) try: logger.info("Starting the bot...") application.run_polling() logger.info("Bot started successfully!") except Exception as e: logger.error(f"Error occurred while starting the bot: {str(e)}")
Anush008 commented 4 months ago

Try deleting the "grade_9" collection. Langchain will auto-create it for you when you run methods like from_documets() and from_texts() with the appropriate dimensions.