Closed eric-gardyn closed 4 months ago
could be an Atlas Vector Search Index issue.
Have you created a vector search index on the field collection with the text-embedding-3-small embeddings? (see https://mongodb.github.io/chatbot/mongodb#3-create-atlas-vector-search-index-required-for-rag)
yes, the index is marked as "Active" with Primary Node: 60 (100%) indexed of 60 total
created with
{
"fields": [
{
"numDimensions": 1536,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
}
]
}
I can also inspect the "embedded_content" collection's documents. So, I can confirm that the ingestion script worked.
hmm, it's hard to help debug this without more information. would you be able to share the source code?
and just to confirm, do you have 2 vector search indexes, 1 for each collection containing embedded_content?
I cloned a brand new instance (from 'main' with commit 9ed093a), created new database/collection ('embedded_content'), new index ('vector_index') from new DB/collection, and ran the "quick start" example (from https://mongodb.github.io/chatbot/quick-start/). the ingest is successfull: index shows: 78 (100%) indexed of 78 total ran the default ui and server (from 'quick-start' folder as well), and I got the same error: "message":"No matching content found".
config:
# MongoDB config
MONGODB_CONNECTION_URI="mongodb+srv://XXX:XXX@gardyn-XXX.XXX.mongodb.net/?retryWrites=true&w=majority"
VECTOR_SEARCH_INDEX_NAME="vector_index" # or whatever your index name is
MONGODB_DATABASE_NAME="gardyn-search-dev-4-32" # or whatever your database name is. must contain vector search index.
# OpenAI config
OPENAI_API_KEY="xxxxxx"
OPENAI_CHAT_COMPLETION_MODEL="gpt-4"
OPENAI_EMBEDDING_MODEL="text-embedding-3-small"
does it matter than the index name 'vector_index' is the same across several DB?
does it matter than the index name 'vector_index' is the same across several DB?
i believe this could be the issue. though it's hard to say without looking at your code and cluster config.
try setting up different index names in the atlas UI.
atlas vector search indexes are set at the cluster level, correlating to a specific collection in a specific database.
unfortunately, it did not work: I created a brand new cluster (M0 Sandbox), used the 'quick-start' example out-of-the-box (with mongoDB doc as a source). I used the mongodb-ui and still got the "Unfortunately, I do not know how to respond to your message.".
I also tried with the "text-embedding-3-large" embedding model, but got error
vector field is indexed with 2048 dimensions but queried with 3072"
as the vector_index does not allow 3072:
Number of vector dimensions. Value can be between 1 and 2048, both inclusive.
Value is above the maximum of 2048.
I created a brand new cluster (M0 Sandbox), used the 'quick-start' example out-of-the-box (with mongoDB doc as a source). I used the mongodb-ui and still got the "Unfortunately, I do not know how to respond to your message.".
hard to say what's going on here without more visibility into your project, but i suspect it's an index config issue.
if you'd like i'd be happy to hop on a call and work through the problem with you. also curious to learn more about your experience using the framework 😄
vector field is indexed with 2048 dimensions but queried with 3072
this is due to Atlas Vector Search limiting the max number of dimensions to index to 2048. from the docs: "You must specify a value less than 2049. You can set this field only for vector type fields."
this isn't related to the Chatbot Framework, but Atlas Vector Search itself, so it'd be hard to work around w/o using a different vector DB.
Â
from troubleshooting session, it looks like for the 'text-embedding-3-small' embedding, the minScore value for the findNearestNeighborsOptions in the makeDefaultFindContent setup needs be set to lower than 0.9
thanks for posting that here. will add to the docs that the similarity score can vary depending on the vector embedding model you use.
closing this issue.
My system is properly working when using: OPENAI_EMBEDDING_MODEL="text-embedding-ada-002" for ingest and server.
However, when I use: OPENAI_EMBEDDING_MODEL="text-embedding-3-small"
I am always getting the message associated to NO_RELEVANT_CONTENT. Log shows
Do I need to change any setting in the Config?