run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.39k stars 4.67k forks source link

[Question]: Llama Index with local Embedding Model Dont Finish #14078

Open msft2000 opened 3 weeks ago

msft2000 commented 3 weeks ago

Question Validation

Question

I have the following code: `import chromadb import os from llama_index.core import VectorStoreIndex, StorageContext, Settings from llama_index.vector_stores.chroma import ChromaVectorStore from llama_index.core.schema import TextNode from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="/MODELS-LOCAL/paraphrase-multilingual-mpnet-base-v2") def leer_archivo_csv(archivo): with open(archivo, 'r', encoding='utf-8') as file: contenido = file.read() return contenido

def extraer_filas_csv(contenido): filas = contenido.split("\n") filas_separadas = [f.split(",") for f in filas] filas_pulidas = [[x.strip() for x in columna] for columna in filas_separadas] return filas_pulidas

directorio = "../data/csv/"

archivos_csv = [f for f in os.listdir(directorio) if f.endswith('.csv')]

nodes = [] for archivo in archivos_csv: ruta_archivo = os.path.join(directorio, archivo) baseDatos, esquema = archivo.split("-") esquema = esquema.split(".")[0] contenido_csv = leer_archivo_csv(ruta_archivo)

filas_csv = extraer_filas_csv(contenido_csv)[1:]
# Crear nodos para cada fila de la tabla
nodes.extend([TextNode(text=x[-1], metadata={"nombre_columna": x[0], "tipo_dato": x[1], "base_datos": baseDatos, "esquema": esquema}) for x in filas_csv if len(x)>1])

db = chromadb.PersistentClient(path="../chroma_db")

print("Creo nodos")

chroma_collection = db.get_or_create_collection("metadata-embeddings-campos-paraphrase")

print("Conecto chromadb")

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

print("Chromadb as vector store")

storage_context = StorageContext.from_defaults(vector_store=vector_store)

print("Storage context condigured")

index = VectorStoreIndex( nodes, storage_context=storage_context, embed_model=embed_model )

print("creo index")`

I see the "creo index" message but the excution never finish it sees like is waiting something

logan-markewich commented 3 weeks ago

@msft2000 not sure 🤔 if you see the creo message, then it's done 😅

If you hit ctrl-c and kill the script, what does the traceback look like? That will show you what it's doing

msft2000 commented 3 weeks ago

I think the same I dont understand where is the error. Here the Traceback: Traceback (most recent call last): File "/.local/lib/python3.9/site-packages/posthog/client.py", line 416, in join consumer.join() File "/usr/lib64/python3.9/threading.py", line 1060, in join self._wait_for_tstate_lock() File "/usr/lib64/python3.9/threading.py", line 1080, in _wait_for_tstate_lock if lock.acquire(block, timeout): KeyboardInterrupt

logan-markewich commented 3 weeks ago

Oh interesting, this is coming directly from chroma (posthog is their analytics api)

Are you running without internet? You'll want to disable this in chroma https://docs.trychroma.com/telemetry

msft2000 commented 3 weeks ago

I am running without internet, thanks a lot I will try it right now

El El mar, 11 jun 2024 a la(s) 13:14, Logan @.***> escribió:

Oh interesting, this is coming directly from chroma (posthog is their analytics api)

Are you running without internet? You'll want to disable this in chroma https://docs.trychroma.com/telemetry

— Reply to this email directly, view it on GitHub https://github.com/run-llama/llama_index/issues/14078#issuecomment-2161347527, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATLNVKC3SFWQ5Z3JRZ3KK4DZG447DAVCNFSM6AAAAABJEYJWMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGM2DONJSG4 . You are receiving this because you were mentioned.Message ID: @.***>

msft2000 commented 3 weeks ago

It solve the problem, thanks