Open SiddharthMurjani opened 2 weeks ago
To resolve the ValueError: Invalid literal for base 10
error while vectorizing, ensure that the vector store is properly initialized and passed to the query. The error message indicates that a vector store is required for the vector store query.
Here is the corrected code snippet:
embed_model = llm_class(api_key=os.getenv("OPENAI_API_KEY"), deployment=model_type, model=model_type, **kwargs)
Settings.embed_model = embed_model
def save_llama(content, metadata, storage_context):
document = Document(text=content, metadata=metadata)
# Ensure the vector store is properly loaded or initialized
vector_store = SomeVectorStore.load_from_disk('vector_store.json') # Replace with actual vector store loading method
VectorStoreIndex.from_documents(
documents=[document],
storage_context=storage_context,
vector_store=vector_store, # Pass the vector store to the query
show_progress=True
)
Make sure to replace SomeVectorStore.load_from_disk('vector_store.json')
with the actual method to load or initialize your vector store. This should resolve the ValueError
by ensuring that the vector store is provided for the vector store query [1].
To continue talking to Dosu, mention @dosu.
After debuggin I got that no index_struct
is being returned.
with self._callback_manager.as_trace("index_construction"):
if index_struct is None:
nodes = nodes or []
index_struct = self.build_index_from_nodes(
nodes + objects # type: ignore
)
And that's why VectoreStoreIndex is not being initialized
class VectorStoreIndex(BaseIndex[IndexDict]):
"""Vector Store Index.
Args:
use_async (bool): Whether to use asynchronous calls. Defaults to False.
show_progress (bool): Whether to show tqdm progress bars. Defaults to False.
store_nodes_override (bool): set to True to always store Node objects in index
store and document store even if vector store keeps text. Defaults to False
"""
index_struct_cls = IndexDict
def __init__(
self,
nodes: Optional[Sequence[BaseNode]] = None,
# vector store index params
use_async: bool = False,
store_nodes_override: bool = False,
embed_model: Optional[EmbedType] = None,
insert_batch_size: int = 2048,
# parent class params
objects: Optional[Sequence[IndexNode]] = None,
index_struct: Optional[IndexDict] = None,
storage_context: Optional[StorageContext] = None,
callback_manager: Optional[CallbackManager] = None,
transformations: Optional[List[TransformComponent]] = None,
show_progress: bool = False,
# deprecated
service_context: Optional[ServiceContext] = None,
**kwargs: Any,
) -> None:
"""Initialize params."""
self._use_async = use_async
self._store_nodes_override = store_nodes_override
self._embed_model = (
resolve_embed_model(embed_model, callback_manager=callback_manager)
if embed_model
else embed_model_from_settings_or_context(Settings, service_context)
)
self._insert_batch_size = insert_batch_size
super().__init__(
nodes=nodes,
**index_struct=index_struct,**
service_context=service_context,
storage_context=storage_context,
show_progress=show_progress,
objects=objects,
callback_manager=callback_manager,
transformations=transformations,
**kwargs,
)
I don't think this is related to anything with VectoreStoreIndex -- I think your embedding model is not returning raw floats for embedding values.
I don't know what llm_class
is doing, but that is likely the issue.
You can easily check with embeddings = embed_model.get_text_embedding("Hello world")
and ensure the returned type is a list of float. Seems like it might be returning numpy or something else
Question Validation
Question
I am vectorizing using the below code