run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.35k stars 5.19k forks source link

[Bug]: VectorStoreIndex no longer working #16726

Open JoseGHdz opened 5 hours ago

JoseGHdz commented 5 hours ago

Bug Description

I updated my packages and now for some reason VectorStoreIndex is not working. Before, I was able to vectorize and see the progress on the chunks being vectorized when running VectorStoreIndex but now it does not even do that.

After the code ran, I was able to see the vectorized content content stored in the chroma db but since nothing is being vectorized, it is not storing the content. Could it be an issue with the packages that I just updated?

Code: from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage, ServiceContext from llama_index.llms.openai import OpenAI

from llama_index.core.indices.query.query_transform.base import DecomposeQueryTransform from llama_index.core.query_engine import MultiStepQueryEngine, CitationQueryEngine from llama_index.core.agent import ReActAgent from llama_index.core.tools import QueryEngineTool, ToolMetadata from llama_index.llms.openai import OpenAI from llama_index.core import Settings from llama_index.legacy import LLMPredictor from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.vector_stores.chroma import ChromaVectorStore

db_path = "chroma_db" embed_model = OpenAIEmbedding(model="text-embedding-3-large")

print('Vectorizing Content') db = chromadb.PersistentClient(path=db_path)

azure_documents = SimpleDirectoryReader("./Azure Docs").load_data()

azure_docs_collection = db.get_or_create_collection("azure_docs") azure_docs_vector_store = ChromaVectorStore(chroma_collection=azure_docs_collection) azure_docs_storage_context = StorageContext.from_defaults(vector_store=azure_docs_vector_store) azure_docs_service_context = ServiceContext.from_defaults(embed_model = embed_model, chunk_size = 1000, chunk_overlap = 20) azure_index = VectorStoreIndex.from_documents(azure_documents, storage_context=azure_docs_storage_context, service_context = azure_docs_service_context, show_progress=True)

print("Vector Content Loaded")

Version

0.11.18

Steps to Reproduce

Updated my packages to these versions: llama-cloud 0.1.4 llama-index 0.11.18 llama-index-agent-openai 0.3.4 llama-index-cli 0.3.1 llama-index-core 0.11.18 llama-index-embeddings-adapter 0.2.2 llama-index-embeddings-openai 0.2.5 llama-index-finetuning 0.2.1 llama-index-indices-managed-llama-cloud 0.4.0 llama-index-legacy 0.9.48 llama-index-llms-azure-openai 0.2.2 llama-index-llms-gradient 0.1.2 llama-index-llms-mistralai 0.2.7 llama-index-llms-openai 0.2.16 llama-index-multi-modal-llms-openai 0.2.2 llama-index-postprocessor-cohere-rerank 0.2.1 llama-index-program-openai 0.2.0 llama-index-question-gen-openai 0.2.0 llama-index-readers-file 0.2.2 llama-index-readers-llama-parse 0.3.0 llama-index-vector-stores-chroma 0.2.1 llama-parse 0.5.10 llamaindex-py-client 0.1.19

Then I ran the code

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 5 hours ago

I couldn't find specific information about recent changes in the VectorStoreIndex implementation in version 0.11.18 of llama-index or any known issues related to ChromaVectorStore in the latest version. However, there are some similar issues that might be relevant:

  1. There was a closed issue regarding a ModuleNotFoundError for llama_index.vector_stores which was resolved by installing the llama-index-vector-stores-chroma package separately and updating the import path for ChromaVectorStore [1].

  2. Another closed issue involved a dependency conflict that was resolved by updating pip, using Anaconda for package management, or manually adjusting version constraints [2].

Given your situation, you might want to ensure that all dependencies are correctly installed and compatible. Specifically, check that the llama-index-vector-stores-chroma package is installed and that the import paths are correct. Additionally, verify that your Python version and other package versions meet the compatibility requirements for llama-index-vector-stores-chroma version 0.2.1, which requires Python >=3.8.1 and <4.0, and chromadb version >=0.4.0 and <0.6.0 [3].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 4 hours ago

Your code seems pretty outdated. Here's what works for me, reproducible in google colab https://colab.research.google.com/drive/1sQQw-A1PvWozajDqZeI_WUzRu9GPo9AV?usp=sharing

ServiceContext is removed in latest versions. Using Settings to control global defaults, or pass in things like embed_models and llms to local constructors

I showed both ways in the notebook

JoseGHdz commented 4 hours ago

Your code seems pretty outdated. Here's what works for me, reproducible in google colab https://colab.research.google.com/drive/1sQQw-A1PvWozajDqZeI_WUzRu9GPo9AV?usp=sharing

ServiceContext is removed in latest versions. Using Settings to control global defaults, or pass in things like embed_models and llms to local constructors

I showed both ways in the notebook

Awesome! Thank you so much. Once I get the issue of cannot import name 'Settings' from 'llama_index.core' (unknown location) resolved I will use that.

logan-markewich commented 54 minutes ago

Hmm, a fresh venv may help with that