run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.54k stars 5.02k forks source link

[Question]: Using vector index with Firestore return empty response #10178

Closed Tomly-Hola-Lee closed 4 months ago

Tomly-Hola-Lee commented 8 months ago

Question Validation

Question

I am following the FirestoreDemo with vector index (https://github.com/run-llama/llama_index/blob/main/docs/examples/docstore/FirestoreDemo.ipynb). I have touched up my code as below


import nest_asyncio

from llama_index import (
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext,
)
from llama_index import VectorStoreIndex, SummaryIndex, SimpleKeywordTableIndex
from llama_index.composability import ComposableGraph
from llama_index.llms import OpenAI
from llama_index.response.notebook_utils import display_response

import logging
import sys

from llama_index.storage.kvstore.firestore_kvstore import FirestoreKVStore
from llama_index.storage.docstore.firestore_docstore import (
    FirestoreDocumentStore,
)
from llama_index.storage.index_store.firestore_indexstore import (
    FirestoreIndexStore,
)

from llama_index.node_parser import SentenceSplitter

email="testing@email.com"
chatroomid="000001"
namespace=email+"_"+chatroomid

nest_asyncio.apply()
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

folder=r"data/paul_graham"
data_path = os.path.join(os.getcwd(), folder)
reader = SimpleDirectoryReader(data_path)
documents = reader.load_data()
nodes = SentenceSplitter().get_nodes_from_documents(documents)

kvstore = FirestoreKVStore()

storage_context = StorageContext.from_defaults(
    docstore=FirestoreDocumentStore(kvstore,namespace=namespace),
    index_store=FirestoreIndexStore(kvstore,namespace=namespace),
)

storage_context.docstore.add_documents(documents)
vector_index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
#%%
vector_id = vector_index.index_id

storage_context.persist()

from llama_index.indices.loading import load_index_from_storage

kvstore = FirestoreKVStore()

re-create storage context
storage_context = StorageContext.from_defaults(
    docstore=FirestoreDocumentStore(kvstore,namespace=namespace),
    index_store=FirestoreIndexStore(kvstore,namespace=namespace)
)

load indices
vector_index = load_index_from_storage(
    storage_context=storage_context, index_id=vector_id
)
query_engine = vector_index.as_query_engine()
vector_response = query_engine.query("What did the author do growing up?")
display_response(vector_response)```

However, I only get empty response. I have tried with Summary index and keyword index and they did work.
May I know how to make vector index work?
dosubot[bot] commented 8 months ago

🤖

Hello again @Tomly-Hola-Lee! It's good to see you diving further into the LlamaIndex library. Let's see what we can do about your question.

Based on the information you've provided and the context from the LlamaIndex repository, it seems like the issue might be related to how the vector index data is being stored and retrieved in Firestore.

The FirestoreKVStore class in the LlamaIndex framework is responsible for storing and retrieving data from Firestore. It uses methods like put, aput, put_all, and aput_all to store data, and get, aget, get_all, and aget_all to retrieve data. These methods use the replace_field_name_set and replace_field_name_get functions to replace certain field names before storing and after retrieving the data.

If the vector index data contains the field names "data" or "type", they will be replaced with "data" and "type" when stored, and replaced back when retrieved. If the vector index data does not contain these field names, or if they are not replaced correctly, this could result in an empty response when retrieving the data.

Here are some steps you can take to troubleshoot this issue:

  1. Check if the vector index data is being stored correctly in Firestore. You can do this by checking the Firestore database directly.

  2. Verify that the vector index data contains the field names "data" and "type". If it does not, you may need to modify the data or the replace_field_name_set and replace_field_name_get functions to handle the correct field names.

  3. Review the code that is using the FirestoreKVStore class to store and retrieve the vector index data. Ensure that it is using the correct collection name and key when calling the put and get methods.

  4. If the issue persists, you may need to debug the FirestoreKVStore class to ensure that the put, get, replace_field_name_set, and replace_field_name_get methods are working correctly.

You can find more information about the FirestoreKVStore class in the firestore_kvstore.py file in the LlamaIndex repository.

As for the VectorStoreIndex.from_documents function, I wasn't able to find specific information about it in the LlamaIndex repository. It would be helpful if you could provide more details about the data you're passing to this function and the output you're expecting. This could help in identifying any potential issues with how this function is being used.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Tomly-Hola-Lee commented 8 months ago

🤖

Hello again @Tomly-Hola-Lee! It's good to see you diving further into the LlamaIndex library. Let's see what we can do about your question.

Based on the information you've provided and the context from the LlamaIndex repository, it seems like the issue might be related to how the vector index data is being stored and retrieved in Firestore.

The FirestoreKVStore class in the LlamaIndex framework is responsible for storing and retrieving data from Firestore. It uses methods like put, aput, put_all, and aput_all to store data, and get, aget, get_all, and aget_all to retrieve data. These methods use the replace_field_name_set and replace_field_name_get functions to replace certain field names before storing and after retrieving the data.

If the vector index data contains the field names "data" or "type", they will be replaced with "data" and "type" when stored, and replaced back when retrieved. If the vector index data does not contain these field names, or if they are not replaced correctly, this could result in an empty response when retrieving the data.

Here are some steps you can take to troubleshoot this issue:

  1. Check if the vector index data is being stored correctly in Firestore. You can do this by checking the Firestore database directly.
  2. Verify that the vector index data contains the field names "data" and "type". If it does not, you may need to modify the data or the replace_field_name_set and replace_field_name_get functions to handle the correct field names.
  3. Review the code that is using the FirestoreKVStore class to store and retrieve the vector index data. Ensure that it is using the correct collection name and key when calling the put and get methods.
  4. If the issue persists, you may need to debug the FirestoreKVStore class to ensure that the put, get, replace_field_name_set, and replace_field_name_get methods are working correctly.

You can find more information about the FirestoreKVStore class in the firestore_kvstore.py file in the LlamaIndex repository.

As for the VectorStoreIndex.from_documents function, I wasn't able to find specific information about it in the LlamaIndex repository. It would be helpful if you could provide more details about the data you're passing to this function and the output you're expecting. This could help in identifying any potential issues with how this function is being used.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosu-bot I notice everything work fine until I use the vector_index = load_index_from_storage( storage_context=storage_context, index_id=vector_id ) It does return <llama_index.indices.vector_store.base.VectorStoreIndex object at 0x7fd4a9bab1f0> However, it still gives empty response

If i am not separating the query part and not using load_index_from_storage( storage_context=storage_context, index_id=vector_id ) It gives me a response. Is there anything I miss?

hatianzhang commented 8 months ago

can you try using https://docs.llamaindex.ai/en/stable/understanding/tracing_and_debugging/tracing_and_debugging.html for debugging which part yield empty results

Tomly-Hola-Lee commented 8 months ago

@hatianzhang

can you try using https://docs.llamaindex.ai/en/stable/understanding/tracing_and_debugging/tracing_and_debugging.html for debugging which part yield empty results

Thanks for suggestion I have open the debug mode and this is the response when I don't recreate storage context i.e. make query after indexing

vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

query_engine = vector_index.as_query_engine()
vector_response = query_engine.query("Who is the author?")
print(vector_response)

The debug panel shows: DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create..parser at 0x7fd778273160>, 'json_data': {'input': ['file_path: /Users/Tomly/Documents/Python/LlamaIndex/data/paul_graham/paul_graham_essay.txt What I Worked On February 2021 Before college the two ma...T_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}} Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create..parser at 0x7fd778273160>, 'json_data': {'input': ['file_path: /Users/Tomly/Documents/Python/LlamaIndex/data/paul_graham/paul_graham_essay.txt What I Worked On ...T_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}} DEBUG:httpx._client:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" DEBUG:openai._base_client:HTTP Request: POST https://api.openai.com/v1/embeddings "200 OK" HTTP Request: POST https://api.openai.com/v1/embeddings "200 OK"

DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create..parser at 0x7fd759e73550>, 'json_data': {'input': ['Who is the author?'], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}} Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create..parser at 0x7fd759e73550>, 'json_data': {'input': ['Who is the author?'], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}} DEBUG:httpx._client:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" DEBUG:openai._base_client:HTTP Request: POST https://api.openai.com/v1/embeddings "200 OK" HTTP Request: POST https://api.openai.com/v1/embeddings "200 OK" DEBUG:llama_index.indices.utils:> Top 2 nodes:

[Node 0f9ec477-71c6-4d02-a12a-facf05a1ef45] [Similarity score: 0.773803] All that seemed left for philosophy were edge cases that people in other fields felt could safely... [Node 73561cee-5ff5-49d7-9c4f-c048bd65d0ad] [Similarity score: 0.773635] I remember taking the boys to the coast on a sunny day in 2015 and figuring out how to deal with ... Top 2 nodes: [Node 0f9ec477-71c6-4d02-a12a-facf05a1ef45] [Similarity score: 0.773803] All that seemed left for philosophy were edge cases that people in other fields felt could safely... [Node 73561cee-5ff5-49d7-9c4f-c048bd65d0ad] [Similarity score: 0.773635] I remember taking the boys to the coast on a sunny day in 2015 and figuring out how to deal with ... DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'role': 'system', 'content': "You are an expert Q&A system that is trusted around the world.\nAlways answer the query using the provided context information, and not prior knowledge.\nSome rules to follow:\n1. Never directly reference the given context in your answer.\n2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines."}, {'role': 'user', 'content':...nformation and not prior knowledge, answer the query.\nQuery: Who is the author?\nAnswer: '}], 'model': 'gpt-3.5-turbo', 'stream': False, 'temperature': 0.1}} Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'role': 'system', 'content': "You are an expert Q&A system that is trusted around the world.\nAlways answer the query using the provided context information, and not prior knowledge.\nSome rules to follow:\n1. Never directl...answer the query.\nQuery: Who is the author?\nAnswer: '}], 'model': 'gpt-3.5-turbo', 'stream': False, 'temperature': 0.1}} DEBUG:httpx._client:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" DEBUG:openai._base_client:HTTP Request: POST https://api.openai.com/v1/chat/completions "200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "200 OK" The author of the text is Paul Graham.

However, if I restore storage context with vector id as below

kvstore = FirestoreKVStore()

storage_context = StorageContext.from_defaults(
    docstore=FirestoreDocumentStore(kvstore),
    index_store=FirestoreIndexStore(kvstore)
)

vector_id='28ddc079-bc01-4947-a543-455dbdb8700a'
vector_index = load_index_from_storage(
    vector_id=vector_id,storage_context=storage_context
)

query_engine = vector_index.as_query_engine()
vector_response = query_engine.query("Who is the author?")
print(vector_response)

It gives the following DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process... Checking None for explicit credentials as part of auth process... DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process... Checking Cloud SDK credentials as part of auth process... DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process... Checking None for explicit credentials as part of auth process... DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process... Checking Cloud SDK credentials as part of auth process... DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process... Checking None for explicit credentials as part of auth process... DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process... Checking Cloud SDK credentials as part of auth process... DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process... Checking None for explicit credentials as part of auth process... DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process... Checking Cloud SDK credentials as part of auth process... INFO:llama_index.indices.loading:Loading all indices. Loading all indices. Loading all indices. DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token Making request: POST https://oauth2.googleapis.com/token Making request: POST https://oauth2.googleapis.com/token DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): oauth2.googleapis.com:443 Starting new HTTPS connection (1): oauth2.googleapis.com:443 Starting new HTTPS connection (1): oauth2.googleapis.com:443 DEBUG:urllib3.connectionpool:https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create..parser at 0x7fd7891ffb80>, 'json_data': {'input': ['Who is the author?'], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}} Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create..parser at 0x7fd7891ffb80>, 'json_data': {'input': ['Who is the author?'], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}} Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create..parser at 0x7fd7891ffb80>, 'json_data': {'input': ['Who is the author?'], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}} DEBUG:httpx._client:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" DEBUG:openai._base_client:HTTP Request: POST https://api.openai.com/v1/embeddings "200 OK" HTTP Request: POST https://api.openai.com/v1/embeddings "200 OK" HTTP Request: POST https://api.openai.com/v1/embeddings "200 OK" DEBUG:llama_index.indices.utils:> Top 0 nodes:

Top 0 nodes:

Top 0 nodes:

Empty Response

May I know what make the problem?

hatianzhang commented 8 months ago
Tomly-Hola-Lee commented 8 months ago

@hatianzhang Hi thanks for suggestion. Unfortunately I have tried with index_id=vector_id and even index_id=vector_id. vector_id=vetor_idbut they didnt work either. I shall get back to you with the debug log

I print vector_id and it give me this : 28ddc079-bc01-4947-a543-455dbdb8700a and I use it with vector_id='28ddc079-bc01-4947-a543-455dbdb8700a' vector_index = load_index_from_storage( vector_id=vector_id,storage_context=storage_context )

How do I know if it contains enough relevant information?

Tomly-Hola-Lee commented 7 months ago

@hatianzhang Sorry for late reply. I have fixed the index_id=vector_id as below

`` vector_id='0999d828-4f66-46d4-a77c-68c4ce8aba6c' vector_index = load_index_from_storage( storage_context=storage_context, index_id=vector_id )

chatgpt = OpenAI(temperature=0, model="gpt-3.5-turbo") service_context_chatgpt = ServiceContext.from_defaults( llm=chatgpt, chunk_size=1024 )

query_engine = vector_index.as_query_engine() vector_response = vector_index.as_query_engine().query("Who is the author?")

display_response(vector_response) ``

And it still gives "Empty response". The debug log as below INFO:llama_index.indices.loading:Loading indices with ids: ['0999d828-4f66-46d4-a77c-68c4ce8aba6c'] Loading indices with ids: ['0999d828-4f66-46d4-a77c-68c4ce8aba6c'] DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token Making request: POST https://oauth2.googleapis.com/token DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): oauth2.googleapis.com:443 Starting new HTTPS connection (1): oauth2.googleapis.com:443 DEBUG:urllib3.connectionpool:https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None

DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create..parser at 0x7fbf4b87b430>, 'json_data': {'input': ['Who is the author?'], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}} Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create..parser at 0x7fbf4b87b430>, 'json_data': {'input': ['Who is the author?'], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}} DEBUG:httpx._client:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" DEBUG:openai._base_client:HTTP Request: POST https://api.openai.com/v1/embeddings "200 OK" HTTP Request: POST https://api.openai.com/v1/embeddings "200 OK" DEBUG:llama_index.indices.utils:> Top 0 nodes:

Top 0 nodes:

Final Response: Empty Response

I have checked with the index in firestore and it looks like this: image

May I know if it containing enough relevant information?

dosubot[bot] commented 4 months ago

Hi, @Tomly-Hola-Lee,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale.

It looks like you're experiencing issues with using vector index with Firestore, and you've tried using Summary index and keyword index, which worked. There have been detailed troubleshooting steps provided, and you've shared debug logs and further details about your attempts to resolve the issue. The current status of the issue is unresolved.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you!